I’ve been working on a project to visualize differences between assessed value and levied taxes in New Jersey and I discovered a bug in some of New Jersey’s parcel data.
New Jersey uses “PAMS_PIN” as a key between the GIS data and the tax assessors’ rolls. The key is a simple concatenation: Municipal Code (a four digit value for each municipality), Block, and Lot joined by underscores. If the Qualifier Code (“QCODE”) field is populated, then that is also joined, again with an underscore. In Somerset County, many of these PAMS_PIN keys are incorrect, resulting in duplicates.
To get an understanding of how many are wrong in Somerset, I initially attempted the following in ArcMap’s Select by Attributes to get a count of the incorrectly-formed keys.
[QCODE] IS NULL AND NOT [PAMS_PIN] = [MUN] + '_' + [BLOCK] + '_' + [LOT]
Which returned zero results. QCODE is apparently not null, it’s either empty string or spaces.
[QCODE] = '' AND NOT [PAMS_PIN] = [MUN] + '_' + [BLOCK] + '_' + [LOT] -- yields 11137 results
I also discovered that ArcGIS’s Select By Attributes apparently does not distinguish between ” (empty string), ‘ ‘ (one space), or ‘ ‘ (five spaces) when searching through the Somerset Personal GDB or Shapefile. Maybe it’s just an issue with those file formats, but it’s definitely frustrating when you know there’s a variable amount of spaces in the QCODE field.
I keep a copy of the GIS and assessors’ rolls locally, getting the GIS from NJGIN and the assessment data from the certified tax lists. Having worked with this data statewide using PostgreSQL, I’ve found that the QCODE field contains a mix of empty strings, multiple spaces, NULLs and in one case, “<Null>”. In the absence of an explicitly defined “No Value” value, I feel NULL1 is what should be in the QCODE field if there is no value (though the argument could be made for an empty string, too). Empty string is by far the most numerous value in the column. Either way, consistency is important and the Qualifier Code field is far from consistent.
SELECT 'NULL' as label, count(*) from parcels where qcode is null UNION ALL SELECT 'Empty String' as label, count(*) from parcels where qcode = '' UNION ALL SELECT '1 space' as label, count(*) from parcels where qcode = ' ' UNION ALL SELECT '2 spaces' as label, count(*) from parcels where qcode = ' ' UNION ALL SELECT '3 spaces' as label, count(*) from parcels where qcode = ' ' UNION ALL SELECT '4 spaces' as label, count(*) from parcels where qcode = ' ' UNION ALL SELECT '5 spaces' as label, count(*) from parcels where qcode = ' ';
I want NULLs in the QCODE field so that I can use the concat_ws function in PostgreSQL to fix the PAMS_PIN values. concat_ws takes a separator as the first argument and concatenates the remaining arguments, ignoring those with NULL values. I coupled concat_ws with the nullif and regexp_replace functions to calculate new PAMS_PIN values.
Here’s what I’ve used to clean up my local copy of the GIS data:
UPDATE parcels SET pams_pin = concat_ws('_', mun, block, lot, nullif(regexp_replace(qcode, ' +', ''), '')) WHERE pams_pin IN ( SELECT q.pams_pin FROM ( SELECT pams_pin, concat_ws('_', mun, block, lot, nullif(regexp_replace(qcode, ' +', ''), '')) as recalc_pin FROM parcels ) q WHERE NOT q.pams_pin = q.recalc_pin );
Many of the 27,945 are in Somerset County, concentrated in Hillsborough Township, but there are errors in 158 other municipalities. Looking through the list, some of these errors are likely not errors and the PAMS_PIN is correct, but the QCODE field has not be properly updated. This appears to be the case in Morris County, as many of the PAMS_PIN values contain values that could not have been derived from the Block, Lot and QCODE values in the same record.
Here’s the complete list of mismatches, as a 166kb zipped CSV file. I’ve sent an email to OIT OGIS staff asking them to look into Somerset County, but it’s unlikely that they’ll be able to address all of the issues I identified, as the problems with the assessment data go far beyond the associated GIS data.
Also, I’ve posted some of the tools I use to work with this data to a Github repository, if you want to work with this data yourself. While the code is somewhat messy and I should go back and refactor things, it does strive to massage the flat-file into a usable form. For instance, there’s no “X number of spaces” ambiguity in the resulting data.
End note on Nulls: If you’re interested why and when you should use NULLs, you should check out “Nulls Make Things Easier?” by Bruce Momjian.