Yesterday, I mentioned that I discovered ArcGIS’s inconsistent handling of spaces within text fields. Today, I tested to see how ArcGIS handles NULLs, Empty Strings and Spaces within its native file formats.
To begin, I needed a reference data set. I created a “test_spaces” table in PostgreSQL to serve that purpose. The table has seven records in it; the “str_test” field contains a NULL, Empty String and then 1 through 5 spaces.
DROP TABLE IF EXISTS test_spaces; CREATE TABLE test_spaces ( Â objectid integer, Â label character varying(12), Â str_test character varying(5), Â shape geometry(Point, 4326), Â CONSTRAINT test_spaces_pkey PRIMARY KEY (objectid) ) WITH (OIDS=FALSE); INSERT INTO test_spaces VALUES (1, 'Null Value', NULL, ST_SetSRID(ST_MakePoint(-1,0),4326)); INSERT INTO test_spaces VALUES (2, 'Empty String', '', ST_SetSRID(ST_MakePoint(0,0),4326)); INSERT INTO test_spaces VALUES (3, '1 Space', ' ', ST_SetSRID(ST_MakePoint(1,0),4326)); INSERT INTO test_spaces VALUES (4, '2 Spaces', 'Â ', ST_SetSRID(ST_MakePoint(2,0),4326)); INSERT INTO test_spaces VALUES (5, '3 Spaces', 'Â Â ', ST_SetSRID(ST_MakePoint(3,0),4326)); INSERT INTO test_spaces VALUES (6, '4 Spaces', 'Â Â Â ', ST_SetSRID(ST_MakePoint(4,0),4326)); INSERT INTO test_spaces VALUES (7, '5 Spaces', 'Â Â Â Â ', ST_SetSRID(ST_MakePoint(5,0),4326));
ArcGIS 10.2 is able to read this PostgreSQL table without issue and the correct values are selected when queried with a WHERE statement via Select By Attributes. This is likely because ArcGIS passes the query to the database and uses the Object ID to identify which features are in the selection result.
I then used ArcGIS to export the table to a shapefile, personal geodatabase, and file geodatabase.
Somewhat unsurprisingly, the shapefile had its issues. First, shapefiles do not support NULL values. dBase came late to the NULL party, so software support for NULLs in DBFs varies. ESRI’s documentation states that NULLs are not supported in shapefiles and will be converted. In the case of a string, a NULL will be changed to an empty string.
Using Select By Attributes and selecting on a string of 3 spaces results in all of the records becoming selected. There’s no distinguishing between a given number of spaces and an empty string.
I started an editing session and looked into the fields to see if the multiple spaces were retained. Multiple spaces were reduced to a single space. So, in Shapefile (or DBFs) spaces alone are not significant and will be reduced to a single space.
Moving on to Personal Geodatabases. Based on the 2003 version of Access databases, the Personal Geodatabase is an aging format. No longer supported by ArcGIS Server, I see them eventually slipping away into semi-obscurity on the Desktop as well.
While PGDBs support NULL values, they continue to exhibit the same issue regarding selecting a specific number of spaces. Selecting for 3 spaces results in the entire set of non-NULL records. What is interesting though is that when editing the PGDB-based data, the spaces are actually there in the attribute table.
On to File Geodatabases, ESRI’s preferred format for Desktop GIS.
I had high hopes for the File GDB. Exhibiting the same deficiencies as the Personal GDB, selecting spaces results in Empty Strings and a variable number of spaces being matched.
Why is this so frustrating to me? Let’s return to Yesterday’s post and assume this “str_test” field is actually something important and when concatenated with other fields, it becomes our unique identifier, or a foreign key to another set of tables.
I added a new text field, “GISKEY” that will be used to store our foreign key. I concatenated “34-“, the “str_test” field, and “-000” together. The spaces were not significant to Select By Attributes, but they are actually in the data.
Now, if this was just with normally empty fields, I could see the logic in conflating empty strings and spaces together. Most users are unaware of NULLs and likely don’t care if there’s a NULL or several spaces in a field, as long as the software can show them “emptiness.” But those spaces still remain and become a problem when calculating the value of new attribute fields.
In the File GDB, I erased the GISKEY field and replaced it with the word “Spaces.” After each “Spaces” I added 1 through 5 spaces for each record, respectively. I then attempted to select just the word “Spaces” with no trailing spaces after the word.
Above, at right, I have the five trailing spaces selected to confirm that they are actually there. Again, ArcGIS ignores those spaces, selecting all values in GISKEY starting with “Spaces” regardless of the trailing spaces.
Lesson learned: be mindful of your space characters, because it will be very difficult to identify them using ArcGIS and they will likely pop back up when performing field calculations. Data validation and enforcement of checks and constraints are always a good thing, as they can prevent these gremlins from appearing in your data. Use an actual DBMS – and the data validation controls they provide – for your GIS data whenever possible. PostgreSQL is free, incredibly robust and powerful, and can work with ArcGIS natively.