Importing OSM Geofabrik data into Oracle database with GDAL/OGR

This post is the third in line of posts about Maps generally and Open Street Maps in particular. The previous one concentrates on building ogr2org on Linux platform, and this post is about using this tool. As this tool is not something completely new and unknown, there is a lot of information on different aspects of using it. So my task isn't to write a comprehensive guide on this topic, but only to highlight a few points and give a general understanding of this tool. And in this post, I will load OSM data into Oracle Database.

Source data

For my experiments I took a few .osm.pbf files from Geofabrik. My choice was based purely on the size of the files. I started with a file sized around 100 MB, then tested 0.5 GB and 1 GB files, and finished with the biggest one sized almost 3 GB.

Belarus131 MB
Spain546 MB
Italy1.1 GB
Germany2.7 GB

Using ogr2ogr

I started with the smallest file and minimum of options. And got an error.

I have no explanation for this error yet. While it looks like self-explanatory, it isn't obvious to me yet. I should step aside to explain what is going on.

Take a look at the POINTS table which ogr2ogr creates with the default configuration file:

"REF" VARCHAR2(2047)
"IS_IN" VARCHAR2(2047)

OGR_FID is a surrogate primary key. ORA_GEOMETRY is a column for storing geometry data. Fields from OSM_ID to MAN_MADE are a source objects' tags, which are configured to be shown as separate fields. And OTHER_TAGS is the field for all other source tags. These tags are stored in the form of comma-separated pairs of "tag name"=>"tag value". For some objects, OTHER_TAGS field could be very big.

For example here is the OTHER_TAGS field for the post office #15 in Minsk.

"name:be"=>"Аддзяленне паштовай сувязі №15",
"name:ru"=>"Отделение почтовой связи №15",
"alt_name"=>"Почта 220015; Почтовое отделение #15; Отделение связи 220015",
"operator"=>"РУП \"Белпочта\"",
"official_name"=>"Отделение почтовой связи Минск-15",
"opening_hours"=>"Mo-Fr 09:00-20:00; Sa 09:00-18:00",

All these values came from OTHER_TAG field. So why did I say that have no explanation for that error? The error message says that object with id=119298123 has OTHER_TAGS field longer than 2047 bytes. The problem is that object with osm_id=119298123 has a very short list of tags:


I can't imagine how it can be more than 2000 bytes long. So right now I don't know why this object is rejected by the driver.

The good news is that OTHER_TAGS field is not mandatory. We can switch it off very easily. That could be achieved by editing the osmconf.ini file. It is located in the /usr/local/share/gdal directory.

This configuration file has a section for every object type.

And there is the other_tag=no configuration option in every section. It's commented out by default, but we may uncomment it.

One more thing I want to change right now is that by default ogr2ogr creates all columns in lower case.

   (    "OGR_FID" NUMBER(*,0), 
    "osm_id" VARCHAR2(2047 CHAR), 
    "name" VARCHAR2(2047 CHAR), 
    "name_ru" VARCHAR2(2047 CHAR), 
    "barrier" VARCHAR2(2047 CHAR), 
    "highway" VARCHAR2(2047 CHAR), 
    "ref" VARCHAR2(2047 CHAR), 
    "address" VARCHAR2(2047 CHAR), 
    "is_in" VARCHAR2(2047 CHAR), 
    "place" VARCHAR2(2047 CHAR), 
    "man_made" VARCHAR2(2047 CHAR), 
    "other_tags" VARCHAR2(2047 CHAR)

To control this behaviour, we can use LAUNDER option. Its syntax is not obvious: -lco LAUNDER=YES

This option makes our tables more convenient to use

   (    "OGR_FID" NUMBER(*,0), 
    "OSM_ID" VARCHAR2(2047 CHAR), 
    "NAME" VARCHAR2(2047 CHAR), 
    "NAME_RU" VARCHAR2(2047 CHAR), 

I already mentioned the next thing in my previous post, but I think it's OK to repeat it once again. If NLS_LANG environment variable is not set, all non-English characters are loaded incorrectly.

It's very easy to correct this issue. All we need is to set NLS_LANG variable to <language>_<territory>.UTF8. It looks like language and territory don't matter, the only character set is important. So I used AMERICAN_AMERICA.UTF8 value.

With NLS_LANG set, our data is loaded correctly.

It takes about 3:40-4:00 minutes to import Belarus file on my machine. And it looks like that the limiting factor is a disk speed. CPU utilization is very low and free memory is available.

Tables row counts and segment sizes for Belarus are the following:

TableRow countSegment size (MB)
POINTS 35972822
LINES 579899160

So space for tables is about 575 MB and total space including spatial indexes is over 650 MB.

After this relatively small file. I started experimenting with bigger ones. The second file was 550 MB Spain. And I faced one more issue. Sometimes import finishes normally, but sometimes I get an error

I was unable to figure out the real cause of this error, but I look like OSM_MAX_TMPFILE_SIZE configuration option helps. Syntax for this option is --config OSM_MAX_TMPFILE_SIZE <size in mb>. Please note: that there is no equal sign between the option name and a value. My experiments show, that this parameter should be about 4-8 times bigger than the source file. But often import finishes without this parameter at all.

Import of the Italy and Germany files gave me no new information. It takes about 1 hour 40 minutes to import Germany, 30 minutes to import Italy and 12 minutes for Spain.

In my next post, I show how to create a simple (very simple) map using this data.

Useful links:
GDAL configuration options
Oracle Spatial driver documentation
OSM driver documentation
ogr2ogr documentation

maps OSM GDAL oracle

Andrew Fomin Profile Picture
Andrew Fomin

OBIEE specialist since 2007 and Oracle Discoverer before. DWH architect, BI enthusiast, blogger. Lazy cats owner. All opinions are my own and not the views of my employer.

Additional Posts
Share This Post
Twitter Google+ Pinterest Facebook
comments powered by HyperComments