Broken libraries? Docker to the rescue!

I’m working on some spatial data, loading shapefiles into PostgreSQL on an older server of mine. Revisiting my previous blog post, I wanted to review more recent NJ Land Use/Land Cover data with the 2002 LULC data I imported to OpenStreetMap. Looking for changes between 2002 and 2012 can highlight areas of new construction, farmland loss, or reforestation that could then be updated in OSM. To do this analysis, I wanted the LULC data for both years as tables in PostgreSQL so I can perform some intersections and identify areas that have changed.

When I tried running ogr2ogr, I received an odd error.

ogr2ogr: /usr/pgsql-10/lib/libpq.so.5: no version information available (required by /lib64/libgdal.so.29)
ogr2ogr: symbol lookup error: /lib64/libgdal.so.29: undefined symbol: GEOSMakeValid_r

Having recently made some changes to this server, I assumed I broke some linked libraries. Instead of going down a rabbit hole trying to fix that, I decided to use Docker to run ogr2ogr (and ogrinfo) to review and export the shapefiles to PostgreSQL. On Docker Hub, there’s a osgeo/gdal image that will provide the GDAL/OGR tools, with the necessary PostgreSQL support.

Normally, running ogr2ogr to export a shapefile to PostGIS is like so:

ogr2ogr -f PostgreSQL PG:"host=postgresql dbname=my_database user=dba password=$PGPASS" -nln my_new_table origin_data.shp

Where the quoted string starting with PG: is the connection string info, -nln is the “new layer name” (or in this case, table name) and finally, the input shapefile.

We will need to pass all of that information in the call to Docker so that the command will run inside the container. Additionally, because our files are not inside the container – it’s just the GDAL/OGR environment – we will need to map a volume that’s accessible within the container.

Here’s the complete command and we’ll break down each step:

sudo docker run --rm -v `pwd`:/storage/:ro osgeo/gdal:alpine-normal-latest ogr2ogr -f PostgreSQL PG:"host=postgresql dbname=landuse user=dba password=$PGPASS" -nln lu2012 -lco GEOMETRY_NAME=shape -select LU12,LABEL12,TYPE12 -nlt MULTIPOLYGON /storage/Land_lu_2012.shp

We run Docker with elevated privileges: sudo docker run --rm. The --rm lets Docker know that we don’t want to keep the container around after it is complete.

We need a way to get to the shapefiles on the host server from within the container. In this case, I have the shapefiles in my current working directory when I run Docker. I can map the contents of the current directory to a directory within the container; in this case I map the working directory to a storage directory at root.

-v `pwd`:/storage/:ro denotes that I should mount a volume of the path to the current working directory (pwd in backticks will be replaced with the results from running pwd), then the mount location within the container is specified (/storage). The optional third parameter is “ro” designating a read-only mount.

osgeo/gdal:alpine-normal-latest specifies the image (and specific tag) for the GDAL image. In this case, I used Alpine Linux as the base distro. I probably could have used alpine-small-latest, as it looks like that slimmed down version includes PostgreSQL support.

After that follows ogr2ogr, which along with its arguments, is run within the container. Note that if you’re typing this all out, you might feel compelled to auto-complete, but remember that you will need to specify the in-container path to your data, not your host environment’s path to the data. For more information on using ogr2ogr, check the documentation and the PostgreSQL specifics.

Hopefully, this will be helpful if you find you can’t normally run GDAL/OGR tools in an environment, but at least have Docker available.

This entry was posted in Data, PostgreSQL, Tools and Scripts and tagged , , , . Bookmark the permalink.