Insignificant Spaces

Yesterday, I mentioned that I discovered ArcGIS’s inconsistent handling of spaces within text fields. Today, I tested to see how ArcGIS handles NULLs, Empty Strings and Spaces within its native file formats.

Selecting 3 spaces selects all the strings.

Selecting 3 spaces selects all the strings.

To begin, I needed a reference data set. I created a “test_spaces” table in PostgreSQL to serve that purpose. The table has seven records in it; the “str_test” field contains a NULL, Empty String and then 1 through 5 spaces.

DROP TABLE IF EXISTS test_spaces;
CREATE TABLE test_spaces (
  objectid integer,
  label character varying(12),
  str_test character varying(5),
  shape geometry(Point, 4326),
  CONSTRAINT test_spaces_pkey PRIMARY KEY (objectid)
) WITH (OIDS=FALSE);

INSERT INTO test_spaces VALUES (1, 'Null Value', NULL, 
  ST_SetSRID(ST_MakePoint(-1,0),4326));
INSERT INTO test_spaces VALUES (2, 'Empty String', '', 
  ST_SetSRID(ST_MakePoint(0,0),4326));
INSERT INTO test_spaces VALUES (3, '1 Space', ' ', 
  ST_SetSRID(ST_MakePoint(1,0),4326));
INSERT INTO test_spaces VALUES (4, '2 Spaces', '  ', 
  ST_SetSRID(ST_MakePoint(2,0),4326));
INSERT INTO test_spaces VALUES (5, '3 Spaces', '   ', 
  ST_SetSRID(ST_MakePoint(3,0),4326));
INSERT INTO test_spaces VALUES (6, '4 Spaces', '    ', 
  ST_SetSRID(ST_MakePoint(4,0),4326));
INSERT INTO test_spaces VALUES (7, '5 Spaces', '     ', 
  ST_SetSRID(ST_MakePoint(5,0),4326));

ArcGIS 10.2 is able to read this PostgreSQL table without issue and the correct values are selected when queried with a WHERE statement via Select By Attributes. This is likely because ArcGIS passes the query to the database and uses the Object ID to identify which features are in the selection result.

A query for 3 spaces on the PostgreSQL table results in the "3 spaces" feature selected.

A query for 3 spaces on the PostgreSQL table results in the “3 spaces” feature selected.

I then used ArcGIS to export the table to a shapefile, personal geodatabase, and file geodatabase.

Somewhat unsurprisingly, the shapefile had its issues. First, shapefiles do not support NULL values. dBase came late to the NULL party, so software support for NULLs in DBFs varies. ESRI’s documentation states that NULLs are not supported in shapefiles and will be converted. In the case of a string, a NULL will be changed to an empty string.

Selecting for 3 spaces selects all features.

Selecting for 3 spaces selects all features.

Using Select By Attributes and selecting on a string of 3 spaces results in all of the records becoming selected. There’s no distinguishing between a given number of spaces and an empty string.

I started an editing session and looked into the fields to see if the multiple spaces were retained. Multiple spaces were reduced to a single space. So, in Shapefile (or DBFs) spaces alone are not significant and will be reduced to a single space.

Moving on to Personal Geodatabases. Based on the 2003 version of Access databases, the Personal Geodatabase is an aging format. No longer supported by ArcGIS Server, I see them eventually slipping away into semi-obscurity on the Desktop as well.

Personal geodatabases also ignore the significance of spaces.

Personal geodatabases also ignore the significance of spaces.

While PGDBs support NULL values, they continue to exhibit the same issue regarding selecting a specific number of spaces. Selecting for 3 spaces results in the entire set of non-NULL records. What is interesting though is that when editing the PGDB-based data, the spaces are actually there in the attribute table.

Five spaces selected.

Five spaces selected.

On to File Geodatabases, ESRI’s preferred format for Desktop GIS.

... and all the strings are selected again.

… and all the strings are selected again.

I had high hopes for the File GDB. Exhibiting the same deficiencies as the Personal GDB, selecting spaces results in Empty Strings and a variable number of spaces being matched.

Why is this so frustrating to me? Let’s return to Yesterday’s post and assume this “str_test” field is actually something important and when concatenated with other fields, it becomes our unique identifier, or a foreign key to another set of tables.

Concatenating strings together.

Concatenating strings together.

I added a new text field, “GISKEY” that will be used to store our foreign key. I concatenated “34-“, the “str_test” field, and “-000” together. The spaces were not significant to Select By Attributes, but they are actually in the data.

Now the spaces show up.

Now the spaces show up.

Now, if this was just with normally empty fields, I could see the logic in conflating empty strings and spaces together. Most users are unaware of NULLs and likely don’t care if there’s a NULL or several spaces in a field, as long as the software can show them “emptiness.” But those spaces still remain and become a problem when calculating the value of new attribute fields.

In the File GDB, I erased the GISKEY field and replaced it with the word “Spaces.” After each “Spaces” I added 1 through 5 spaces for each record, respectively. I then attempted to select just the word “Spaces” with no trailing spaces after the word.

Trailing spaces are also insignificant.

Trailing spaces are also insignificant.

Above, at right, I have the five trailing spaces selected to confirm that they are actually there. Again, ArcGIS ignores those spaces, selecting all values in GISKEY starting with “Spaces” regardless of the trailing spaces.

Lesson learned: be mindful of your space characters, because it will be very difficult to identify them using ArcGIS and they will likely pop back up when performing field calculations. Data validation and enforcement of checks and constraints are always a good thing, as they can prevent these gremlins from appearing in your data. Use an actual DBMS – and the data validation controls they provide – for your GIS data whenever possible. PostgreSQL is free, incredibly robust and powerful, and can work with ArcGIS natively.

This entry was posted in Data, PostgreSQL, Technology, Tools and Scripts. Bookmark the permalink.