Easy methods to learn OSM information with DuckDB

0
36


Easy methods to Learn OSM Information with DuckDB

A deep dive into OpenStreetMap information construction and methods to put it to use in a scalable manner

1*uyXB2hQETSJ3wJaZVXPVyw
Dall-E 3 picture: Cute and cute 3D render duck learning a paper map, vibrant sky, with blur background, top quality, 8k

This text will present an in-depth take a look at methods to learn OpenStreetMap information utilizing the DuckDB database.

The steps described on this information will permit the reader to load the OSM information utilizing the Monaco instance divided into nodes, methods, and relations.

1*WVVublrE3NtZfADjDEW4sw
1*rh6F3fRMS4cTIFuCkeyfeQ
The ultimate results of OSM components learn utilizing the DuckDB engine. From the left: nodes, methods and relations. Generated by the creator utilizing GeoPandas library.

A fundamental data of the SQL language is anticipated to completely perceive the steps described on this tutorial. Many of the GIS-related operations and particular joins are described additional within the article.

Define of the article

  • What’s OSM? — introduction to the OSM service.
  • OpenStreetMap information mannequin — definition of objects used within the OSM.
  • Studying OSM information — fundamental operations on the information utilizing DuckDB.
  • Establishing level geometries from the nodes
  • Establishing linestring and polygon geometries from the methods
  • Establishing polygon and multi-polygon geometries from the relations
  • Examples of badly outlined relation objects
  • QuackOSM — a hassle-free software for studying OSM information

What’s OSM?

OpenStreetMap (OSM) is the preferred free map of the world and it's stored alive by a rising base of volunteers and contributors.

The information collected and constructed by the neighborhood is offered publicly without spending a dime and business functions, so many firms, tutorial researchers and particular person builders use this useful resource of their tasks. All information is offered beneath the Open Information Commons Open Database License (ODbL).

The information will be accessed in a number of methods:

Probably the most space-efficient file kind during which the information is saved is Protocolbuffer Binary Format with an extension *.osm.pbf . You may learn extra about it right here.

You may also learn this brief article about OpenStreetMap from Eugenia Anello:

A complete information for getting began with OpenStreetMap

OpenStreetMap information mannequin

This part is predicated on OSM Wiki web page about Components

Conceptually the information in OpenStreetMap is cut up into 3 parts:

Nodes characterize factors in house. They’re represented by a pair of coordinates in a WGS84 Coordinate Reference System — longitude and latitude. Nodes can be utilized to outline a single function on a map (eg. bench, lamp publish, tree) or be used with different nodes to characterize a form of a manner.

1*JT1fhwAeylTGM7YfdhDvZQ
Instance of a node — a tree within the park. Screenshot from the OpenStreetMap by the creator.

Methods are shapes representing a polyline through the use of an ordered checklist of nodes. These polylines will be open and characterize roads, canals, and partitions, or they are often closed to type a polygon and characterize buildings, forests, lakes or different easy shapes.

1*bAFNcGDhn lJ1ZwgGKl1kg
Instance of a manner — part of a freeway street. Screenshot from the OpenStreetMap by the creator.

Relations characterize relationships between a number of objects and information components outlined within the OSM. This could possibly be for instance a bus route with methods displaying roads on which the bus travels and nodes displaying stops of a route, or a multi polygon with holes represented by not less than 2 methods: these will be an outer polygon and an internal polygon.

1*O06Tp6aZaugf9zkJvUeUfA
An instance of a relation— a resort constructing define with holes. Screenshot from the OpenStreetMap by the creator.

Every factor can, however doesn’t need to, have tags connected. Tags describe the which means of the factor. They’re composed of a key and a worth. There is no such thing as a fastened dictionary of these values, however customers ought to keep inside conventions documented within the OSM Wiki.

Moreover, every factor has an ID that’s distinctive in a given factor kind house (so there could be a node with ID = 100, a manner with ID = 100 and a relation with ID = 100).

Studying OSM information

Many instruments permit customers to remodel the OSM information mannequin to file codecs generally used within the GIS area, comparable to GDAL. These instruments are routinely reconstructing geometries from the uncooked information. We’ll attempt to learn it and reconstruct geometries manually.

Examples beneath present methods to entry uncooked information and are written in SQL utilizing DuckDB engine with Spatial extension. All queries with a full Jupyer pocket book will be accessed within the GitHub repository.
You may run the pocket book within the parallel or you possibly can set up the DuckDB engine and open the CLI to execute the queries there.

medium-articles/articles/osm-duckdb/code.ipynb at foremost · RaczeQ/medium-articles

Getting the information

For simplicity and quick access, examples are centered totally on the Monaco area. You may obtain the present extract from the Geofabrik obtain server: https://obtain.geofabrik.de/europe/monaco.html (and click on monaco-latest.osm.pbf obtain hyperlink)

Familiarisation with the information construction

To start out, we’ll use the DESCRIBEfunction to get details about the columns:

DESCRIBE SELECT * FROM ST_READOSM('monaco-latest.osm.pbf');

┌─────────────┬──────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ additional │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────────┼──────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ form │ ENUM('node', 'manner', 'relation', 'changeset') │ YES │ NULL │ NULL │ NULL │
│ id │ BIGINT │ YES │ NULL │ NULL │ NULL │
│ tags │ MAP(VARCHAR, VARCHAR) │ YES │ NULL │ NULL │ NULL │
│ refs │ BIGINT[] │ YES │ NULL │ NULL │ NULL │
│ lat │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ lon │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ ref_roles │ VARCHAR[] │ YES │ NULL │ NULL │ NULL │
│ ref_types │ ENUM('node', 'manner', 'relation')[] │ YES │ NULL │ NULL │ NULL │
└─────────────┴──────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┘

There are 8 columns returned by the ST_READOSMfunction:

  • form — that is the kind of a component. It will possibly even have a worth of changesetrepresenting the modifications after enhancing the present factor within the OSM. Extracts from the Geofabrik obtain server don’t comprise changesets, so we don’t have to consider them.
  • id — the factor identifier.
  • tags — map (or a dictionary) of two strings: a tag key and a tag worth.
  • refs — checklist of member IDs associated to the factor. Nodes ought to have this checklist empty and methods and relations can’t have it empty.
  • lat and lon — latitude and longitude of a node. Methods and relations ought to have these fields empty since solely nodes can have coordinates within the OSM.
  • ref_roles and ref_types — an inventory of extra details about members: what position is assigned to the member and what kind is it (node, manner or relation).

Counting the components

Let’s see what number of components there are in complete and per factor kind.

SELECT COUNT(*) as total_elements
FROM ST_READOSM('monaco-latest.osm.pbf');

┌────────────────┐
│ total_elements │
│ int64 │
├────────────────┤
│ 35782 │
└────────────────┘

SELECT form, COUNT(*) as total_elements
FROM ST_READOSM('monaco-latest.osm.pbf')
GROUP BY 1;

┌──────────────────────────────────────────────┬────────────────┐
│ form │ total_features │
│ enum('node', 'manner', 'relation', 'changeset') │ int64 │
├──────────────────────────────────────────────┼────────────────┤
│ node │ 30643 │
│ manner │ 4849 │
│ relation │ 290 │
└──────────────────────────────────────────────┴────────────────┘

Wanting on the components

Let’s examine the examples of the information for every factor kind.

SELECT *
FROM ST_READOSM('monaco-latest.osm.pbf')
WHERE form = 'node'
LIMIT 5;

┌──────────────────────┬──────────┬─────────────────────────────┬─────────┬────────────────────┬────────────────────┬───────────┬───────────────────────────────────┐
│ form │ id │ tags │ refs │ lat │ lon │ ref_roles │ ref_types │
│ enum('node', 'manner'… │ int64 │ map(varchar, varchar) │ int64[] │ double │ double │ varchar[] │ enum('node', 'manner', 'relation')[] │
├──────────────────────┼──────────┼─────────────────────────────┼─────────┼────────────────────┼────────────────────┼───────────┼───────────────────────────────────┤
│ node │ 21911883 │ │ │ 43.737117500000004 │ 7.422909300000001 │ │ │
│ node │ 21911886 │ {crossing=zebra, crossing… │ │ 43.737239900000006 │ 7.423498500000001 │ │ │
│ node │ 21911894 │ │ │ 43.737773100000005 │ 7.4259193 │ │ │
│ node │ 21911901 │ │ │ 43.737762100000005 │ 7.4267099000000005 │ │ │
│ node │ 21911908 │ │ │ 43.7381906 │ 7.426961 │ │ │
└──────────────────────┴──────────┴─────────────────────────────┴─────────┴────────────────────┴────────────────────┴───────────┴───────────────────────────────────┘

SELECT *
FROM ST_READOSM('monaco-latest.osm.pbf')
WHERE form = 'manner'
LIMIT 5;

┌──────────────────────┬─────────┬──────────────────────┬─────────────────────────────────────────┬────────┬────────┬───────────┬───────────────────────────────────┐
│ form │ id │ tags │ refs │ lat │ lon │ ref_roles │ ref_types │
│ enum('node', 'manner'… │ int64 │ map(varchar, varch… │ int64[] │ double │ double │ varchar[] │ enum('node', 'manner', 'relation')[] │
├──────────────────────┼─────────┼──────────────────────┼─────────────────────────────────────────┼────────┼────────┼───────────┼───────────────────────────────────┤
│ manner │ 4097656 │ {freeway=secondary… │ [21912089, 7265761724, 1079750744, 21… │ │ │ │ │
│ way │ 4098197 │ {highway=tertiary,… │ [21918500, 10723812919, 1204288480, 2… │ │ │ │ │
│ way │ 4224972 │ {highway=residenti… │ [25177418, 4939779378, 4939779381, 49… │ │ │ │ │
│ way │ 4224978 │ {access=no, addr:c… │ [25178088, 25178091, 25178045, 251780… │ │ │ │ │
│ way │ 4226740 │ {highway=secondary… │ [25192130, 6444966483, 1737389127, 64… │ │ │ │ │
└──────────────────────┴─────────┴──────────────────────┴─────────────────────────────────────────┴────────┴────────┴───────────┴───────────────────────────────────┘

SELECT *
FROM ST_READOSM('monaco-latest.osm.pbf')
WHERE kind = 'relation'
LIMIT 5;

┌──────────────────────┬───────┬──────────────────────┬──────────────────────┬────────┬────────┬──────────────────────┬─────────────────────────────────────────────┐
│ kind │ id │ tags │ refs │ lat │ lon │ ref_roles │ ref_types │
│ enum('node', 'way'… │ int64 │ map(varchar, varch… │ int64[] │ double │ double │ varchar[] │ enum('node', 'manner', 'relation')[] │
├──────────────────────┼───────┼──────────────────────┼──────────────────────┼────────┼────────┼──────────────────────┼─────────────────────────────────────────────┤
│ relation │ 7385 │ {ISO3166-2=FR-06, … │ [1701090139, 32665… │ │ │ [admin_centre, lab… │ [node, node, way, way, way, way, way, way… │
│ relation │ 8654 │ {ISO3166-2=FR-PAC,… │ [26761400, 1251610… │ │ │ [admin_centre, lab… │ [node, node, way, way, way, way, way, way… │
│ relation │ 11980 │ {ISO3166-1=FR, ISO… │ [17807753, 1363947… │ │ │ [admin_centre, lab… │ [node, node, relation, relation, relation… │
│ relation │ 36990 │ {ISO3166-1=MC, ISO… │ [1790048269, 77077… │ │ │ [admin_centre, out… │ [node, way, way, way, way, way, way, way,… │
│ relation │ 90352 │ {admin_level=2, bo… │ [770774507, 377944… │ │ │ [outer, outer, out… │ [way, way, way, way, way, way, way, way, … │
└──────────────────────┴───────┴──────────────────────┴──────────────────────┴────────┴────────┴──────────────────────┴─────────────────────────────────────────────┘

Now we can see how the elements are defined: nodes have coordinates, ways have refs lists filled with node IDs and relations have the most complicated structure with refs lists filled with IDs and ref_types lists showing which ID correspond to which element type. Additionally, ref_roles have information about the role of the member (admin_centre, label, inner, outer).

Constructing point geometries from the nodes

Now that we know what the structure looks like, we can start building some geometries. Starting with nodes should be the easiest since it’s just a pair of latitudes and longitudes in the WGS84 Coordinate Reference System.

We should only extract nodes with at least one tag attached since those have any semantical meaning for analytical purposes. Nodes without any tags could be used to construct ways in the later stages.

SELECT
id,
tags,
ST_POINT(lon, lat) geometry
FROM ST_READOSM('monaco-latest.osm.pbf')
WHERE kind = 'node'
AND tags IS NOT NULL
AND cardinality(tags) > 0;

┌─────────────┬─────────────────────────────────────────────────────────┬──────────────────────────────────────────────┐
│ id │ tags │ geometry │
│ int64 │ map(varchar, varchar) │ geometry │
├─────────────┼─────────────────────────────────────────────────────────┼──────────────────────────────────────────────┤
│ 21911886 │ {crossing=zebra, crossing:island=no, crossing_ref=zeb… │ POINT (7.423498500000001 43.737239900000006) │
│ 21912962 │ {crossing=zebra, crossing_ref=zebra, highway=crossing} │ POINT (7.426912100000001 43.737912800000004) │
│ 21914341 │ {crossing=uncontrolled, crossing_ref=zebra, highway=c… │ POINT (7.4233732 43.737010000000005) │
│ 21915639 │ {highway=traffic_signals} │ POINT (7.4256003 43.7404449) │
│ 21917308 │ {bus=yes, name=Monte-Carlo (Casino), public_transport… │ POINT (7.4259854 43.740984700000006) │
│ 21918329 │ {crossing=marked, crossing:markings=yes, highway=cros… │ POINT (7.427889100000001 43.7423616) │
│ 21918589 │ {crossing=marked, crossing:island=yes, crossing:marki… │ POINT (7.4317478 43.7472774) │
│ 21918697 │ {crossing=marked, crossing:island=no, crossing:markin… │ POINT (7.432645000000001 43.747892900000004) │
│ 21918939 │ {bus=yes, name=Portier, public_transport=stop_position} │ POINT (7.430429800000001 43.741472800000004) │
│ 21919093 │ {crossing=marked, crossing:markings=yes, crossing_ref… │ POINT (7.4352171 43.748160000000006) │
│ · │ · │ · │
│ · │ · │ · │
│ · │ · │ · │
│ 11450012980 │ {direction=forward, highway=give_way} │ POINT (7.416853100000001 43.735809700000004) │
│ 11450012981 │ {direction=forward, highway=give_way} │ POINT (7.416783000000001 43.735872900000004) │
│ 11450012982 │ {direction=forward, highway=give_way} │ POINT (7.416664900000001 43.735887000000005) │
│ 11450012983 │ {direction=forward, highway=give_way} │ POINT (7.4166968 43.7356945) │
│ 11451922579 │ {bus=yes, highway=bus_stop, name=Larvotto, public_tra… │ POINT (7.435032400000001 43.7481639) │
│ 11451922580 │ {bus=yes, highway=bus_stop, name=Grimaldi Forum, publ… │ POINT (7.4311343 43.7442067) │
│ 11451922581 │ {bench=yes, bus=yes, highway=bus_stop, name=Portier, … │ POINT (7.430357300000001 43.742209100000004) │
│ 11451922582 │ {bench=no, bin=no, bus=yes, highway=bus_stop, lit=yes… │ POINT (7.4107674 43.7296193) │
│ 11451922600 │ {direction=backward, highway=give_way} │ POINT (7.4105622 43.7291648) │
│ 11452057060 │ {direction=backward, highway=give_way} │ POINT (7.419164 43.737116) │
├─────────────┴─────────────────────────────────────────────────────────┴──────────────────────────────────────────────┤
│ 3167 rows (20 shown) 3 columns │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1*WVVublrE3NtZfADjDEW4sw
Nodes plotted on a map. Generated by the author using GeoPandas library.

After filtering the nodes based on tags, we are left with 3167 points.
It’s around 10% of the total number of nodes in the source file.

Constructing linestring and polygon geometries from the ways

With nodes out of the way 😉, let’s focus now on ways. Ways can take the form of linestrings or polygons. Let’s focus on linestrings first and then we will assign proper geometry types.

To construct the ways we have to do multiple operations:

  • Select matching ways with tags.
  • Unnest all nodes refs for each way element and keep them in the proper order (remember – nodes refs order matter!).
  • Select required nodes with geometries.
  • Group nodes per way ID and construct a linestring geometry using ST_MakeLinefunction.
  • Join constructed geometries with tags.

Let’s start with selecting ways with tags.

CREATE TEMP TABLE matching_ways AS 
SELECT id, tags
FROM ST_READOSM('monaco-latest.osm.pbf')
WHERE kind = 'way' AND tags IS NOT NULL AND cardinality(tags) > 0;

SELECT * FROM matching_ways;

┌────────────┬─────────────────────────────────────────────────────────────────┐
│ id │ tags │
│ int64 │ map(varchar, varchar) │
├────────────┼─────────────────────────────────────────────────────────────────┤
│ 4097656 │ {highway=secondary, lanes=2, lit=yes, maxspeed=30, name=Avenu… │
│ 4098197 │ {highway=tertiary, lanes=2, lit=yes, name=Boulevard d`Italie,… │
│ 4224972 │ {highway=residential, lit=yes, name=Avenue des Papalins, onew… │
│ 4224978 │ {access=no, addr:country=MC, leisure=pitch, lit=yes, sport=so… │
│ 4226740 │ {highway=secondary, lanes=2, lit=yes, maxspeed=50, name=Boule… │
│ 4227155 │ {area=yes, highway=pedestrian, lit=yes, name=Place du Palais,… │
│ 4227156 │ {access=no, bus=yes, demolished:highway=service, oneway=yes, … │
│ 4227157 │ {highway=residential, name=Rue des Remparts, oneway=yes, surf… │
│ 4227158 │ {highway=residential, name=Rue Philibert Florence, oneway=yes… │
│ 4227164 │ {highway=secondary, lanes=2, lit=yes, name=Virage Antony Nogu… │
│ · │ · │
│ · │ · │
│ · │ · │
│ 1230247124 │ {bench=no, bin=yes, bus=yes, check_date=2023-12-10, covered=y… │
│ 1230273018 │ {access=yes, leisure=playground, wheelchair=yes} │
│ 1230273019 │ {highway=footway, surface=paving_stones} │
│ 1230398878 │ {abandoned:railway=rail, highway=primary, lanes=2, lit=yes, m… │
│ 1233828725 │ {highway=service, lit=yes, name=Place Moulins, oneway=yes} │
│ 1233828726 │ {highway=tertiary, junction=roundabout, lanes=2, lit=yes, non… │
│ 1233853668 │ {highway=secondary, lanes=2, lit=yes, maxspeed=50, name=Boule… │
│ 1238339144 │ {cycleway:left=lane, highway=tertiary, lanes=2, lit=yes, maxs… │
│ 1238339145 │ {cycleway:left=lane, highway=tertiary, lanes=2, lit=yes, maxs… │
│ 1239415633 │ {footway=sidewalk, highway=footway} │
├────────────┴─────────────────────────────────────────────────────────────────┤
│ 4786 rows (20 shown) 2 columns │
└──────────────────────────────────────────────────────────────────────────────┘

Now we will unnest refs lists and split them into individual rows. We will also utilize DuckDB’s indexing functions to remember the order of elements in the original list.

CREATE TEMP TABLE matching_ways_with_nodes_refs AS
SELECT id, UNNEST(refs) as ref, UNNEST(range(length(refs))) as ref_idx
FROM ST_READOSM('monaco-latest.osm.pbf')
SEMI JOIN matching_ways USING (id)
WHERE kind = 'way';

SELECT * FROM matching_ways_with_nodes_refs;

┌───────────┬────────────┬─────────┐
│ id │ ref │ ref_idx │
│ int64 │ int64 │ int64 │
├───────────┼────────────┼─────────┤
│ 4097656 │ 21912089 │ 0 │
│ 4097656 │ 7265761724 │ 1 │
│ 4097656 │ 1079750744 │ 2 │
│ 4097656 │ 2104793864 │ 3 │
│ 4097656 │ 6340961560 │ 4 │
│ 4097656 │ 1110560507 │ 5 │
│ 4097656 │ 21912093 │ 6 │
│ 4097656 │ 6340961559 │ 7 │
│ 4097656 │ 21912095 │ 8 │
│ 4097656 │ 7265762803 │ 9 │
│ · │ · │ · │
│ · │ · │ · │
│ · │ · │ · │
│ 151154357 │ 7927855820 │ 2 │
│ 151154358 │ 1868015912 │ 0 │
│ 151154358 │ 1868015910 │ 1 │
│ 151154358 │ 1868015900 │ 2 │
│ 151154358 │ 4437858221 │ 3 │
│ 151154358 │ 1868015889 │ 4 │
│ 151154358 │ 1790048390 │ 5 │
│ 151154358 │ 1868015881 │ 6 │
│ 151154358 │ 1868015894 │ 7 │
│ 151154358 │ 1868015896 │ 8 │
├───────────┴────────────┴─────────┤
│ ? rows (>9999 rows, 20 shown) │
└──────────────────────────────────┘

As you can see, there are now multiple rows per way element and multiple ref values. Ref_idx represents the original order in the refs list.

You can also see the SEMI JOINclause in the query. This is a special join available in the DuckDB, that just filters rows without actually joining the second table. You can read more about it in the official documentation:

FROM & JOIN Clauses

Now we can select the required nodes based on the refs from the previous step:

CREATE TEMP TABLE required_nodes_with_geometries AS
SELECT id, ST_POINT(lon, lat) geometry
FROM ST_READOSM('monaco-latest.osm.pbf') nodes
SEMI JOIN matching_ways_with_nodes_refs
ON nodes.id = matching_ways_with_nodes_refs.ref
WHERE kind = 'node';

SELECT * FROM required_nodes_with_geometries;

┌────────────┬──────────────────────────────────────────────┐
│ id │ geometry │
│ int64 │ geometry │
├────────────┼──────────────────────────────────────────────┤
│ 4547239708 │ POINT (7.4219302 43.7307441) │
│ 4547239709 │ POINT (7.421943700000001 43.730763) │
│ 4547239710 │ POINT (7.421885100000001 43.7308797) │
│ 4547239711 │ POINT (7.421869 43.730828200000005) │
│ 4547239712 │ POINT (7.4218596 43.7307865) │
│ 4547239713 │ POINT (7.4219166 43.7307701) │
│ 4547239714 │ POINT (7.4220242 43.730742) │
│ 4547239715 │ POINT (7.4220765 43.730725500000005) │
│ 4547239716 │ POINT (7.4220677 43.730680400000004) │
│ 4547239717 │ POINT (7.422105200000001 43.730662) │
│ · │ · │
│ · │ · │
│ · │ · │
│ 9983243582 │ POINT (7.427689600000001 43.732439400000004) │
│ 9983243583 │ POINT (7.4238021 43.7298837) │
│ 9983243584 │ POINT (7.4272774 43.7321503) │
│ 9983243585 │ POINT (7.4287133 43.7403807) │
│ 9983243586 │ POINT (7.429299500000001 43.740351100000005) │
│ 9983243587 │ POINT (7.4265819 43.7399094) │
│ 9983243588 │ POINT (7.4275342 43.7323072) │
│ 9983243589 │ POINT (7.427783700000001 43.732368) │
│ 9983243590 │ POINT (7.428839600000001 43.7402718) │
│ 9983243591 │ POINT (7.4269439 43.739707700000004) │
├────────────┴──────────────────────────────────────────────┤
│ ? rows (>9999 rows, 20 shown) 2 columns │
└───────────────────────────────────────────────────────────┘

Now we can construct the full linestrings:

CREATE TEMP TABLE matching_ways_linestrings AS
SELECT
matching_ways.id,
matching_ways.tags,
ST_MakeLine(list(nodes.geometry ORDER BY ref_idx ASC)) linestring
FROM matching_ways
JOIN matching_ways_with_nodes_refs
ON matching_ways.id = matching_ways_with_nodes_refs.id
JOIN required_nodes_with_geometries nodes
ON matching_ways_with_nodes_refs.ref = nodes.id
GROUP BY 1, 2;

SELECT * FROM matching_ways_linestrings;

┌────────────┬──────────────────────┬──────────────────────────────────────────┐
│ id │ tags │ linestring │
│ int64 │ map(varchar, varch… │ geometry │
├────────────┼──────────────────────┼──────────────────────────────────────────┤
│ 4227212 │ {highway=residenti… │ LINESTRING (7.424979 43.73150620000000… │
│ 4227230 │ {access=yes, admin… │ LINESTRING (7.4220576 43.7321018, 7.42… │
│ 4227273 │ {highway=secondary… │ LINESTRING (7.4214168 43.7365583000000… │
│ 4229325 │ {highway=secondary… │ LINESTRING (7.4201113 43.7373194000000… │
│ 4229900 │ {highway=secondary… │ LINESTRING (7.4167819 43.7299324, 7.41… │
│ 4230116 │ {highway=footway, … │ LINESTRING (7.4302666 43.7467818, 7.43… │
│ 24672725 │ {highway=residenti… │ LINESTRING (7.4356746 43.7516752, 7.43… │
│ 25082701 │ {area=yes, highway… │ LINESTRING (7.421602 43.7370201, 7.421… │
│ 31900562 │ {area=yes, floatin… │ LINESTRING (7.422541900000001 43.73529… │
│ 37794470 │ {admin_level=2, bo… │ LINESTRING (7.438411400000001 43.74942… │
│ · │ · │ · │
│ · │ · │ · │
│ · │ · │ · │
│ 1203859766 │ {landuse=grass} │ LINESTRING (7.4172849 43.7324023000000… │
│ 1203859775 │ {landuse=grass} │ LINESTRING (7.417039300000001 43.73209… │
│ 1218054879 │ {highway=footway, … │ LINESTRING (7.4257143 43.7307631000000… │
│ 1229465608 │ {amenity=shelter, … │ LINESTRING (7.417716100000001 43.73075… │
│ 1230102086 │ {footway=crossing,… │ LINESTRING (7.414346 43.72708160000000… │
│ 1230123519 │ {footway=link, hig… │ LINESTRING (7.4238967 43.7370229, 7.42… │
│ 1202747620 │ {crossing=unmarked… │ LINESTRING (7.420526100000001 43.72671… │
│ 1203858289 │ {crossing=uncontro… │ LINESTRING (7.4179021 43.7344211000000… │
│ 1203861575 │ {amenity=parking} │ LINESTRING (7.430036100000001 43.74015… │
│ 1203861579 │ {landuse=construct… │ LINESTRING (7.4323636 43.7423150000000… │
├────────────┴──────────────────────┴──────────────────────────────────────────┤
│ 4786 rows (20 shown) 3 columns │
└──────────────────────────────────────────────────────────────────────────────┘
1*dBlz3 FcEoOfuOgBInKKng
Ways linestrings plotted on a map. Generated by the author using GeoPandas library.

After doing all of those operations, we have ways in the linestring form. Now we have to select the ways that are supposed to be polygons. We can do this based on tag values. Unfortunately, there is no single source of truth for this operation. We can look at the page from the OSM wiki to see one of the definitions created by the community.

Overpass turbo/Polygon Features

1*STTaGhcJ0vKjiSHN0lTe6A
A screenshot from the Overpass turbo/Polygon Features wiki page. Taken on 2024-01-17.

As you can see, this list is quite long, so for brevity we will only check if the linestring forms a closed loop and if the area tag value is not ‘no’.

WITH way_polygon_feature AS (
SELECT
id,
(
-- if first and last nodes are the same
ST_Equals(ST_StartPoint(linestring), ST_EndPoint(linestring))
-- if the element doesn't have any tags leave it as a Linestring
AND tags IS NOT NULL
-- if the element is specifically tagged 'area':'no' -> LineString
AND NOT (
list_contains(map_keys(tags), 'area')
AND list_extract(map_extract(tags, 'area'), 1) = 'no'
)
) AS is_polygon
FROM matching_ways_linestrings
)
SELECT
matching_ways_linestrings.id,
matching_ways_linestrings.tags,
(CASE
WHEN is_polygon
THEN ST_MakePolygon(linestring)
ELSE linestring
END)::GEOMETRY AS geometry
FROM matching_ways_linestrings
JOIN way_polygon_feature
ON matching_ways_linestrings.id = way_polygon_feature.id;

┌────────────┬──────────────────────┬──────────────────────────────────────────┐
│ id │ tags │ geometry │
│ int64 │ map(varchar, varch… │ geometry │
├────────────┼──────────────────────┼──────────────────────────────────────────┤
│ 4226740 │ {highway=secondary… │ LINESTRING (7.422170500000001 43.73286… │
│ 4227198 │ {highway=residenti… │ LINESTRING (7.420338900000001 43.73223… │
│ 4227216 │ {highway=service, … │ LINESTRING (7.423448400000001 43.73268… │
│ 4227242 │ {highway=secondary… │ LINESTRING (7.4221786 43.7322418000000… │
│ 4227272 │ {highway=secondary… │ LINESTRING (7.422117500000001 43.73702… │
│ 4229292 │ {foot=no, highway=… │ LINESTRING (7.41643 43.7311373, 7.4168… │
│ 4229577 │ {highway=residenti… │ LINESTRING (7.4251499 43.7397216, 7.42… │
│ 4230122 │ {alt_name=rue R. P… │ LINESTRING (7.428823400000001 43.74602… │
│ 25082741 │ {highway=secondary… │ LINESTRING (7.4231414 43.7369886000000… │
│ 25722909 │ {highway=footway, … │ LINESTRING (7.423659300000001 43.73134… │
│ · │ · │ · │
│ · │ · │ · │
│ · │ · │ · │
│ 1089844256 │ {handrail=no, high… │ LINESTRING (7.4229453 43.7326080000000… │
│ 1089844296 │ {highway=footway} │ LINESTRING (7.427017200000001 43.74044… │
│ 1202570722 │ {access=private, l… │ POLYGON ((7.4263241 43.7390763, 7.4263… │
│ 946167565 │ {disused:leisure=p… │ POLYGON ((7.4276993 43.739335100000005… │
│ 1089844271 │ {highway=footway} │ LINESTRING (7.4277667 43.732816, 7.427… │
│ 335731716 │ {highway=footway} │ LINESTRING (7.417764200000001 43.73439… │
│ 779454018 │ {building=resident… │ POLYGON ((7.4217522 43.7277202, 7.4218… │
│ 943330855 │ {landuse=grass} │ POLYGON ((7.4151844 43.7332252, 7.4152… │
│ 1089844229 │ {highway=steps, in… │ LINESTRING (7.4274525 43.7328052, 7.42… │
│ 49209608 │ {addr:city=Monaco,… │ POLYGON ((7.4223448 43.7304076, 7.4221… │
├────────────┴──────────────────────┴──────────────────────────────────────────┤
│ 4786 rows (20 shown) 3 columns │
└──────────────────────────────────────────────────────────────────────────────┘

Now we can see the mix of linestring and polygon geometries in the final result. Of course, the predicate for the is_polygon column could (or even should) be extended by the logic mentioned above regarding the values of the tags.

Compared to the image above, many more filled polygons are now visible on the map.

1*K8v8NjrTliLmBY3QqDnFhg
Ways geometries plotted on a map. Generated by the author using GeoPandas library.

Constructing polygon and multi-polygon geometries from the relations

Relations are utilized in OSM to group multiple other elements into a single object. Here we will focus solely on the (multi-) polygons. These specific elements have a type tag with one of two values: boundary, and multipolygon.

This kind of object is the most complex to reconstruct the geometry for and here is the list of steps we have to do:

  • Select relations with proper type value.
  • Unnest all refs related to the relation and keep only way refs — we only need related way refs to reconstruct a polygon.
  • Select required ways with linestring geometries — here we can utilize steps from constructing ways.
  • Assign an ‘outer’ role to the way ref if it’s null and check if any ref from the relation has the role ‘outer’ — if a relation has no ‘outer’ refs then treat all of them as ‘outer’.
  • Group all linestrings per ‘outer’ and ‘inner’ role and merge them into a single multilinestring — many relations are defined with multiple single linestrings that only together create a closed polygon.
  • Split multilinestrings into single closed-loop linestrings and save them as polygons.
  • Split geometries into ‘outer’ and ‘inner’ polygons. These can be extracted from the ref_role column of the relation object.
  • For each ‘outer’ polygon, select all ‘inner’ polygons that are fully within it and make holes in this polygon.
  • Make a union of all ‘outer’ polygons with holes.

Let’s start with selecting relations with matching tag values.

CREATE TEMP TABLE matching_relations AS 
SELECT id, tags
FROM ST_READOSM('monaco-latest.osm.pbf')
WHERE kind = 'relation' AND len(refs) > 0
AND tags IS NOT NULL AND cardinality(tags) > 0
AND list_contains(map_keys(tags), 'type')
AND list_has_any(map_extract(tags, 'type'), ['boundary', 'multipolygon']);

SELECT * FROM matching_relations;

┌──────────┬───────────────────────────────────────────────────────────────────┐
│ id │ tags │
│ int64 │ map(varchar, varchar) │
├──────────┼───────────────────────────────────────────────────────────────────┤
│ 7385 │ {ISO3166-2=FR-06, admin_level=6, alt_name:de=Meeralpen, border_… │
│ 8654 │ {ISO3166-2=FR-PAC, admin_level=4, border_type=area, boundary=… │
│ 36990 │ {ISO3166-1=MC, ISO3166-1:alpha2=MC, ISO3166-1:alpha3=MCO, admin… │
│ 174558 │ {admin_level=8, boundary=administrative, title=Roquebrune-Cap-Ma… │
│ 174562 │ {admin_level=8, boundary=administrative, title=Beausoleil, title:… │
│ 174956 │ {admin_level=8, alt_name:it=Capo d`Aglio, boundary=administrati… │
│ 174958 │ {admin_level=8, boundary=administrative, title=La Turbie, title:i… │
│ 393226 │ {constructing=citadel, castle_type=palace, cost=10€, electronic mail=visites… │
│ 393481 │ {constructing=faculty, title=Pensionnat des Dames de Saint-Maur, kind… │
│ 1124039 │ {ISO3166-1=MC, ISO3166-1:alpha2=MC, ISO3166-1:alpha3=MCO, ISO31… │
│ · │ · │
│ · │ · │
│ · │ · │
│ 14399505 │ {space=sure, floating=sure, man_made=pier, kind=multipolygon} │
│ 16248281 │ {constructing=flats, title=F, kind=multipolygon} │
│ 16248282 │ {constructing=flats, title=E, kind=multipolygon} │
│ 16248283 │ {constructing=flats, title=D, kind=multipolygon} │
│ 16248284 │ {constructing=flats, title=C, kind=multipolygon} │
│ 16248285 │ {constructing=flats, title=B, kind=multipolygon} │
│ 16248286 │ {constructing=flats, title=A, kind=multipolygon} │
│ 16250182 │ {addr:nation=MC, alt_name=Parc de la Roseraie, leisure=park, n… │
│ 16261416 │ {pure=water, kind=multipolygon, water=pond} │
│ 16467322 │ {admin_level=2, boundary=land_area, land_area=administrative, n… │
├──────────┴───────────────────────────────────────────────────────────────────┤
│ 78 rows (20 proven) 2 columns │
└──────────────────────────────────────────────────────────────────────────────┘

Now we’ll unnest refs lists and cut up them into particular person rows. Like with methods, right here we will even make the most of DuckDB’s indexing capabilities to recollect the order of components within the authentic checklist. Moreover, we’ll solely maintain the best way refs.

CREATE TEMP TABLE matching_relations_with_ways_refs AS
WITH unnested_relation_refs AS (
SELECT
r.id,
UNNEST(refs) as ref,
UNNEST(ref_types) as ref_type,
UNNEST(ref_roles) as ref_role,
UNNEST(vary(size(refs))) as ref_idx
FROM ST_READOSM('monaco-latest.osm.pbf') r
SEMI JOIN matching_relations USING (id)
WHERE form = 'relation'
)
SELECT id, ref, ref_role, ref_idx
FROM unnested_relation_refs
WHERE ref_type = 'manner';

SELECT * FROM matching_relations_with_ways_refs;

┌──────────┬───────────┬──────────┬─────────┐
│ id │ ref │ ref_role │ ref_idx │
│ int64 │ int64 │ varchar │ int64 │
├──────────┼───────────┼──────────┼─────────┤
│ 7385 │ 30836152 │ outer │ 2 │
│ 7385 │ 889278953 │ outer │ 3 │
│ 7385 │ 889278956 │ outer │ 4 │
│ 7385 │ 889278957 │ outer │ 5 │
│ 7385 │ 889278958 │ outer │ 6 │
│ 7385 │ 889278962 │ outer │ 7 │
│ 7385 │ 960087656 │ outer │ 8 │
│ 7385 │ 30836155 │ outer │ 9 │
│ 7385 │ 889278965 │ outer │ 10 │
│ 7385 │ 889278963 │ outer │ 11 │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ 11278320 │ 35150936 │ outer │ 197 │
│ 11278320 │ 466647567 │ outer │ 198 │
│ 11278320 │ 214008684 │ outer │ 199 │
│ 11278320 │ 466647569 │ outer │ 200 │
│ 11278320 │ 466647568 │ outer │ 201 │
│ 11278320 │ 214008689 │ outer │ 202 │
│ 11278320 │ 263776194 │ outer │ 203 │
│ 11278320 │ 31124893 │ outer │ 204 │
│ 11278320 │ 31124895 │ outer │ 205 │
│ 11278320 │ 31124899 │ outer │ 206 │
├──────────┴───────────┴──────────┴─────────┤
│ ? rows (>9999 rows, 20 proven) 4 columns │
└───────────────────────────────────────────┘

Within the subsequent step, we’ll assemble linestrings for the methods required by the relations. The question beneath compresses virtually full logic of studying methods in a single go (getting required nodes, establishing factors and grouping them into linestrings):

CREATE TEMP TABLE required_ways_linestrings AS
WITH ways_required_by_relations_with_nodes_refs AS (
SELECT id, UNNEST(refs) as ref, UNNEST(vary(size(refs))) as ref_idx
FROM ST_READOSM('monaco-latest.osm.pbf') methods
SEMI JOIN matching_relations_with_ways_refs
ON methods.id = matching_relations_with_ways_refs.ref
WHERE form = 'manner'
),
nodes_required_by_relations_with_geometries AS (
SELECT id, ST_POINT(lon, lat) geometry
FROM ST_READOSM('monaco-latest.osm.pbf') nodes
SEMI JOIN ways_required_by_relations_with_nodes_refs
ON nodes.id = ways_required_by_relations_with_nodes_refs.ref
WHERE form = 'node'
)
SELECT
methods.id,
ST_MakeLine(checklist(nodes.geometry ORDER BY ref_idx ASC)) linestring
FROM ways_required_by_relations_with_nodes_refs methods
JOIN nodes_required_by_relations_with_geometries nodes
ON methods.ref = nodes.id
GROUP BY 1;

SELECT * FROM required_ways_linestrings;

┌────────────┬─────────────────────────────────────────────────────────────────┐
│ id │ linestring │
│ int64 │ geometry │
├────────────┼─────────────────────────────────────────────────────────────────┤
│ 37794470 │ LINESTRING (7.438411400000001 43.749422300000006, 7.4384056 4… │
│ 87878917 │ LINESTRING (7.418041100000001 43.7256933, 7.4181547 43.725613… │
│ 94252430 │ LINESTRING (7.4141873 43.729494100000004, 7.414203400000001 4… │
│ 94399618 │ LINESTRING (7.4239717 43.740543100000004, 7.4238252 43.740372… │
│ 94399736 │ LINESTRING (7.423881400000001 43.7402971, 7.4239411 43.740265… │
│ 104863462 │ LINESTRING (7.424840000000001 43.731281800000005, 7.424917000… │
│ 120114110 │ LINESTRING (7.420872 43.7370979, 7.4207787 43.7370534, 7.4206… │
│ 156242249 │ LINESTRING (7.4294728 43.739895600000004, 7.429579500000001 4… │
│ 165636038 │ LINESTRING (7.419717100000001 43.732398700000005, 7.419772900… │
│ 169297863 │ LINESTRING (7.422815300000001 43.7321967, 7.4234109 43.732263… │
│ · │ · │
│ · │ · │
│ · │ · │
│ 24874398 │ LINESTRING (7.418523400000001 43.7247599, 7.419012800000001 4… │
│ 92627424 │ LINESTRING (7.4166521 43.7322122, 7.4162303 43.73173910000000… │
│ 94452829 │ LINESTRING (7.4317066 43.7469647, 7.4317998 43.7470371, 7.431… │
│ 335740502 │ LINESTRING (7.4185151 43.7353633, 7.418442000000001 43.735298… │
│ 398377193 │ LINESTRING (7.420872 43.7370979, 7.420939000000001 43.7371298… │
│ 405529436 │ LINESTRING (7.417534900000001 43.7258914, 7.4175347 43.725833… │
│ 423632259 │ LINESTRING (7.4212208 43.7320944, 7.421461300000001 43.732057… │
│ 572935479 │ LINESTRING (7.427508100000001 43.7394168, 7.427496700000001 4… │
│ 586494707 │ LINESTRING (7.4270357 43.739109000000006, 7.4269512 43.739155… │
│ 1202570715 │ LINESTRING (7.426448400000001 43.739491400000006, 7.4264682 4… │
├────────────┴─────────────────────────────────────────────────────────────────┤
│ 141 rows (20 proven) 2 columns │
└──────────────────────────────────────────────────────────────────────────────┘

After creating the required linestrings, now we will be part of them with relations information. We will even make it possible for the required ref_role is correctly parsed — fill within the empty values or substitute them if the relation has incorrectly outlined ref_roles within the OSM database.

CREATE TEMP TABLE matching_relations_with_ways_linestrings AS
WITH unnested_relations_with_way_linestrings AS (
SELECT
r.id,
COALESCE(r.ref_role, 'outer') as ref_role,
r.ref,
w.linestring::GEOMETRY as geometry
FROM matching_relations_with_ways_refs r
JOIN required_ways_linestrings w
ON w.id = r.ref
ORDER BY r.id, r.ref_idx
),
any_outer_refs AS (
-- examine if any manner connected to the relation has the `outer` position
SELECT id, bool_or(ref_role == 'outer') has_any_outer_refs
FROM unnested_relations_with_way_linestrings
GROUP BY id
)
SELECT
unnested_relations_with_way_linestrings.* EXCLUDE (ref_role),
-- if not one of the manner refs has `outer` position - deal with every ref as `outer`
CASE WHEN any_outer_refs.has_any_outer_refs
THEN unnested_relations_with_way_linestrings.ref_role
ELSE 'outer'
END as ref_role
FROM unnested_relations_with_way_linestrings
JOIN any_outer_refs
ON any_outer_refs.id = unnested_relations_with_way_linestrings.id;

SELECT * FROM matching_relations_with_ways_linestrings;

┌──────────┬───────────┬────────────────────────────────────────────┬──────────┐
│ id │ ref │ geometry │ ref_role │
│ int64 │ int64 │ geometry │ varchar │
├──────────┼───────────┼────────────────────────────────────────────┼──────────┤
│ 7385 │ 772081595 │ LINESTRING (7.415291600000001 43.7234393… │ outer │
│ 7385 │ 212810311 │ LINESTRING (7.4120416 43.7280955, 7.4123… │ outer │
│ 7385 │ 176533407 │ LINESTRING (7.412670800000001 43.7316348… │ outer │
│ 7385 │ 37811853 │ LINESTRING (7.4127007 43.7346954, 7.4126… │ outer │
│ 7385 │ 37794471 │ LINESTRING (7.428754700000001 43.7460174… │ outer │
│ 7385 │ 398372186 │ LINESTRING (7.436885 43.7519173, 7.43677… │ outer │
│ 7385 │ 37794470 │ LINESTRING (7.438411400000001 43.7494223… │ outer │
│ 7385 │ 770774507 │ LINESTRING (7.439171000000001 43.7490109… │ outer │
│ 8654 │ 772081595 │ LINESTRING (7.415291600000001 43.7234393… │ outer │
│ 8654 │ 212810311 │ LINESTRING (7.4120416 43.7280955, 7.4123… │ outer │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ 11546878 │ 840890844 │ LINESTRING (7.420754 43.735872900000004,… │ outer │
│ 11546878 │ 840890848 │ LINESTRING (7.4207413 43.7356673, 7.4207… │ outer │
│ 16467322 │ 772081595 │ LINESTRING (7.415291600000001 43.7234393… │ outer │
│ 16467322 │ 212810311 │ LINESTRING (7.4120416 43.7280955, 7.4123… │ outer │
│ 16467322 │ 176533407 │ LINESTRING (7.412670800000001 43.7316348… │ outer │
│ 16467322 │ 37811853 │ LINESTRING (7.4127007 43.7346954, 7.4126… │ outer │
│ 16467322 │ 37794471 │ LINESTRING (7.428754700000001 43.7460174… │ outer │
│ 16467322 │ 398372186 │ LINESTRING (7.436885 43.7519173, 7.43677… │ outer │
│ 16467322 │ 37794470 │ LINESTRING (7.438411400000001 43.7494223… │ outer │
│ 16467322 │ 770774507 │ LINESTRING (7.439171000000001 43.7490109… │ outer │
├──────────┴───────────┴────────────────────────────────────────────┴──────────┤
│ 410 rows (20 proven) 4 columns │
└──────────────────────────────────────────────────────────────────────────────┘

As you possibly can see, there are a number of methods with linestrings assigned to every relation. Let’s take a look at an instance to see how they appear on the map:

A single relation (5986437) with colour-coded methods which might be a part of it. Generated by the creator utilizing GeoPandas library.

To create full polygons, we now have to make the most of a ST_LineMerge perform, that may mix an inventory of linestrings (you possibly can examine it to tying items of string collectively). You may learn extra about this operation right here:

ST_LineMerge

As a further validation step, we’ll examine if the produced linestrings have not less than 4 factors and if the primary level equals the final level, earlier than changing them into polygons:

CREATE TEMP TABLE matching_relations_with_merged_polygons AS
WITH merged_linestrings AS (
SELECT
id,
ref_role,
UNNEST(
ST_Dump(ST_LineMerge(ST_Collect(checklist(geometry)))),
recursive := true
),
FROM matching_relations_with_ways_linestrings
GROUP BY id, ref_role
),
relations_with_linestrings AS (
SELECT
id,
ref_role,
-- ST_Dump returns column named `geom`
geom AS geometry,
row_number() OVER (PARTITION BY id) as geometry_id
FROM
merged_linestrings
-- discard linestrings with lower than 4 factors
WHERE ST_NPoints(geom) >= 4
),
valid_relations AS (
SELECT id, is_valid
FROM (
SELECT
id,
bool_and(
-- Verify if begin level equals the tip level
ST_Equals(ST_StartPoint(geometry), ST_EndPoint(geometry))
) is_valid
FROM relations_with_linestrings
GROUP BY id
)
WHERE is_valid = true
)
SELECT
id,
ref_role,
ST_MakePolygon(geometry) geometry,
geometry_id
FROM relations_with_linestrings
SEMI JOIN valid_relations
ON relations_with_linestrings.id = valid_relations.id;

SELECT * FROM matching_relations_with_merged_polygons;

┌──────────┬──────────┬──────────────────────────────────────────┬─────────────┐
│ id │ ref_role │ geometry │ geometry_id │
│ int64 │ varchar │ geometry │ int64 │
├──────────┼──────────┼──────────────────────────────────────────┼─────────────┤
│ 1369631 │ internal │ POLYGON ((7.438232200000001 43.7511057… │ 1 │
│ 1369631 │ outer │ POLYGON ((7.438008600000001 43.7502850… │ 2 │
│ 1484217 │ outer │ POLYGON ((7.423010700000001 43.7307028… │ 1 │
│ 1484217 │ internal │ POLYGON ((7.423114600000001 43.7305994… │ 2 │
│ 5986436 │ outer │ POLYGON ((7.428754700000001 43.7460174… │ 1 │
│ 8269572 │ outer │ POLYGON ((7.4287048 43.737701400000006… │ 1 │
│ 8269572 │ internal │ POLYGON ((7.4284492 43.7375604, 7.4286… │ 2 │
│ 16248286 │ outer │ POLYGON ((7.426306200000001 43.7391565… │ 1 │
│ 16248286 │ internal │ POLYGON ((7.4263241 43.7390763, 7.4263… │ 2 │
│ 16261416 │ internal │ POLYGON ((7.417829200000001 43.731382,… │ 1 │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ · │ · │ · │ · │
│ 8280869 │ internal │ POLYGON ((7.426753300000001 43.7388868… │ 2 │
│ 8280869 │ internal │ POLYGON ((7.4270357 43.739109000000006… │ 3 │
│ 8280869 │ internal │ POLYGON ((7.4271374 43.739011500000004… │ 4 │
│ 8280869 │ internal │ POLYGON ((7.4272666 43.7389165, 7.4273… │ 5 │
│ 11384697 │ outer │ POLYGON ((7.427075 43.7315167, 7.42749… │ 1 │
│ 11384697 │ internal │ POLYGON ((7.426917700000001 43.7313266… │ 2 │
│ 11384697 │ internal │ POLYGON ((7.427001400000001 43.7313937… │ 3 │
│ 11546878 │ outer │ POLYGON ((7.4207413 43.7356673, 7.4207… │ 1 │
│ 11538023 │ internal │ POLYGON ((7.426528200000001 43.7329359… │ 1 │
│ 11538023 │ outer │ POLYGON ((7.426554 43.7328276, 7.42654… │ 2 │
├──────────┴──────────┴──────────────────────────────────────────┴─────────────┤
│ 88 rows (20 proven) 4 columns │
└──────────────────────────────────────────────────────────────────────────────┘

Let’s see the earlier instance after this operation. We should always count on two separate polygons to be current:

A single relation (5986437) with merged methods as two separate polygons. Generated by the creator utilizing GeoPandas library.

I’ve talked about beforehand the outer and internal ‘ref_types’ of ways in which create a relation. Right here you possibly can see what it appears like:

1*E v 0h0lGtYnHdj9TmsD2A
A single relation (8280869) with merged methods grouped into outer and internal polygons. Generated by the creator utilizing GeoPandas library.

The roles imply that the internal methods are the ‘holes’ inside outer polygons and we now have to breed this step to make correct geometries.

Let’s focus now on splitting the polygons into teams with and with out holes, utilizing the ST_Within predicate, which checks if a geometry is totally inside one other.

CREATE TEMP TABLE matching_relations_with_outer_polygons_with_holes AS
WITH outer_polygons AS (
SELECT id, geometry_id, geometry
FROM matching_relations_with_merged_polygons
WHERE ref_role = 'outer'
), inner_polygons AS (
SELECT id, geometry_id, geometry
FROM matching_relations_with_merged_polygons
WHERE ref_role = 'internal'
)
SELECT
op.id,
op.geometry_id,
ST_Difference(any_value(op.geometry), ST_Union_Agg(ip.geometry)) geometry
FROM outer_polygons op
JOIN inner_polygons ip
ON op.id = ip.id AND ST_WITHIN(ip.geometry, op.geometry)
GROUP BY op.id, op.geometry_id;

CREATE TEMP TABLE matching_relations_with_outer_polygons_without_holes AS
WITH outer_polygons AS (
SELECT id, geometry_id, geometry
FROM matching_relations_with_merged_polygons
WHERE ref_role = 'outer'
)
SELECT
op.id,
op.geometry_id,
op.geometry
FROM outer_polygons op
ANTI JOIN matching_relations_with_outer_polygons_with_holes opwh
ON op.id = opwh.id AND op.geometry_id = opwh.geometry_id;

SELECT * FROM matching_relations_with_outer_polygons_with_holes
UNION ALL
SELECT * FROM matching_relations_with_outer_polygons_without_holes;

┌──────────┬─────────────┬─────────────────────────────────────────────────────┐
│ id │ geometry_id │ geometry │
│ int64 │ int64 │ geometry │
├──────────┼─────────────┼─────────────────────────────────────────────────────┤
│ 16248286 │ 1 │ POLYGON ((7.4263139 43.7391602, 7.426324900000001… │
│ 1369192 │ 1 │ POLYGON ((7.4226247 43.7402251, 7.4227848 43.7396… │
│ 8147763 │ 1 │ POLYGON ((7.438223000000001 43.7477296, 7.4382403… │
│ 393226 │ 1 │ POLYGON ((7.4204802 43.731016700000005, 7.4203979… │
│ 11484092 │ 4 │ POLYGON ((7.417896900000001 43.724827000000005, 7… │
│ 11384697 │ 1 │ POLYGON ((7.4274934 43.731250700000004, 7.4267196… │
│ 8269572 │ 1 │ POLYGON ((7.4287166 43.7376724, 7.4287948 43.7376… │
│ 16261416 │ 3 │ POLYGON ((7.4178309 43.731391, 7.417877000000001 … │
│ 16248284 │ 1 │ POLYGON ((7.426637500000001 43.7394678, 7.4266485… │
│ 16248285 │ 1 │ POLYGON ((7.426114600000001 43.739267600000005, 7… │
│ · │ · │ · │
│ · │ · │ · │
│ · │ · │ · │
│ 11546879 │ 1 │ POLYGON ((7.4209316 43.7356234, 7.421037 43.73562… │
│ 2220206 │ 1 │ POLYGON ((7.4120416 43.7280955, 7.412103900000001… │
│ 5986438 │ 1 │ POLYGON ((7.4191482 43.738887500000004, 7.4199833… │
│ 2221178 │ 1 │ POLYGON ((7.415860400000001 43.7313885, 7.416195 … │
│ 2221179 │ 1 │ POLYGON ((7.422280000000001 43.7367186, 7.4227584… │
│ 2220208 │ 1 │ POLYGON ((7.41849 43.730922400000004, 7.4185156 4… │
│ 2254506 │ 1 │ POLYGON ((7.432995900000001 43.7447102, 7.4329125… │
│ 5986437 │ 1 │ POLYGON ((7.4307593 43.745595, 7.4306621 43.74541… │
│ 5986437 │ 2 │ POLYGON ((7.4343358 43.7457085, 7.4341337 43.7455… │
│ 11546878 │ 1 │ POLYGON ((7.4207413 43.7356673, 7.4207413 43.7357… │
├──────────┴─────────────┴─────────────────────────────────────────────────────┤
│ 47 rows (20 proven) 3 columns │
└──────────────────────────────────────────────────────────────────────────────┘

On this question, there may be utilized one other particular be part of — ANTI JOIN. This one filters out all of the rows on the left-side desk which might be joined by the right-side desk.

The final step that’s wanted is to merge all of the polygons for a single relation utilizing the ST_Union_Agg operation. It should mix all polygons into multipolygons (if there may be a couple of) and produce a single geometry.

WITH unioned_outer_geometries AS (
SELECT id, geometry
FROM matching_relations_with_outer_polygons_with_holes
UNION ALL
SELECT id, geometry
FROM matching_relations_with_outer_polygons_without_holes
),
final_geometries AS (
SELECT id, ST_Union_Agg(geometry) geometry
FROM unioned_outer_geometries
GROUP BY id
)
SELECT r_g.id, r.tags, r_g.geometry
FROM final_geometries r_g
JOIN matching_relations r
ON r.id = r_g.id;

┌──────────┬──────────────────────┬────────────────────────────────────────────┐
│ id │ tags │ geometry │
│ int64 │ map(varchar, varch… │ geometry │
├──────────┼──────────────────────┼────────────────────────────────────────────┤
│ 393226 │ {constructing=citadel, … │ POLYGON ((7.4204802 43.731016700000005, … │
│ 393481 │ {constructing=faculty, … │ POLYGON ((7.423992800000001 43.731841800… │
│ 1369191 │ {constructing=apartmen… │ POLYGON ((7.4231004 43.7412894, 7.423314… │
│ 1369192 │ {constructing=apartmen… │ POLYGON ((7.4226247 43.7402251, 7.422784… │
│ 1369193 │ {constructing=apartmen… │ POLYGON ((7.424 43.7405491, 7.4240401 43… │
│ 1369195 │ {constructing=apartmen… │ POLYGON ((7.418679900000001 43.738187100… │
│ 1369631 │ {addr:housenumber=… │ POLYGON ((7.4379729 43.7502505, 7.437937… │
│ 1369632 │ {addr:metropolis=Monaco,… │ POLYGON ((7.4317121 43.74709, 7.4318277 … │
│ 1484190 │ {addr:metropolis=Monaco,… │ POLYGON ((7.425469 43.731494000000005, 7… │
│ 1484217 │ {constructing=retail, … │ POLYGON ((7.423202300000001 43.7307117, … │
│ · │ · │ · │
│ · │ · │ · │
│ · │ · │ · │
│ 14399505 │ {space=sure, floatin… │ POLYGON ((7.4195986 43.7265293, 7.419595… │
│ 16248281 │ {constructing=apartmen… │ POLYGON ((7.4260438 43.739789800000004, … │
│ 16248282 │ {constructing=apartmen… │ POLYGON ((7.426243100000001 43.7396824, … │
│ 16248283 │ {constructing=apartmen… │ POLYGON ((7.426438200000001 43.739575200… │
│ 16248284 │ {constructing=apartmen… │ POLYGON ((7.426637500000001 43.7394678, … │
│ 16248285 │ {constructing=apartmen… │ POLYGON ((7.426114600000001 43.739267600… │
│ 16248286 │ {constructing=apartmen… │ POLYGON ((7.4263139 43.7391602, 7.426324… │
│ 16250182 │ {addr:nation=MC, … │ POLYGON ((7.418348300000001 43.7256979, … │
│ 16261416 │ {pure=water, ty… │ POLYGON ((7.4178309 43.731391, 7.4178770… │
│ 11546878 │ {addr:metropolis=Monaco,… │ POLYGON ((7.4207413 43.7356673, 7.420741… │
├──────────┴──────────────────────┴────────────────────────────────────────────┤
│ 46 rows (20 proven) 3 columns │
└──────────────────────────────────────────────────────────────────────────────┘

Right here is the earlier relation instance, now with holes:

1*0GCxyrEb02fCNe7 XzITkw
A single relation (8280869) — a polygon with holes. Generated by the creator utilizing GeoPandas library.
1*rh6F3fRMS4cTIFuCkeyfeQ
Relations geometries plotted on a map. Generated by the creator utilizing GeoPandas library.

Examples of badly outlined relation objects

As OpenStreetMap information is especially added by the neighborhood, there are examples the place geometries are usually not correctly outlined. The OSM wiki describes guidelines for map makers on methods to add geometries to a map.

Relation:multipolygon/validity

This part will define some frequent errors with examples.

Two overlapping ‘outer’ methods within the relation

This constructing is outlined by two shapes: a rectangle and virtually a circle. Since these two overlap, you possibly can see how the rendering engine created a niche the place there in all probability shouldn’t be any.

1*C8S3pRWnVT1KoMFNAROmsA
Screenshot from the OpenStreetMap by the creator.

No ‘outer’ manner within the relation

Right here you possibly can see a paintball area with manner members outlined as ‘Foremost Constructing’ and 4 ‘Arenas’. These needs to be all outlined as outer methods.

1*pLh2 LXWp1vY2j1ayM0vTQ
Screenshot from the OpenStreetMap by the creator.

Two overlapping or touching ‘internal’ methods

Screenshot from the OpenStreetMap by the creator.

If you’re concerned about studying extra about how these will be fastened, look into this repository:

fixing-polygons-in-osm/doc/background.md at grasp · osmlab/fixing-polygons-in-osm

QuackOSM — a hassle-free software for studying OSM information

To finish this text I need to spotlight a library that may routinely obtain the OSM information, filter it by the geometry or utilizing OSM tags and reserve it as a GeoParquet file that may be simply built-in into extra scalable options. The library is written in Python and is open-source.

You may set up it with a single command: pip set up quackosm[cli].

This tutorial comprises a simplified model of the queries used within the QuackOSM 🦆, however these received’t scale very effectively for the larger areas. The library can simply parse entire nations comparable to France on a consumer-grade PC if it’s good to. You may in fact make the most of the DuckDB engine afterward the ready GeoParquet file utilizing SPATIAL extension.

The entire steps outlined right here will be changed with a single line of code:

>>> import quackosm as qosm
>>> qosm.get_features_gdf('monaco-latest.osm.pbf')
tags geometry
feature_id
manner/834616137 {'freeway': 'footway'} LINESTRING (7.42133 43.72711, 7.42134 43.72710...
manner/408817108 {'addr:nation': 'MC', 'constructing': 'sure', 'nam... POLYGON ((7.43533 43.74965, 7.43534 43.74955, ...
manner/686435440 {'freeway': 'footway'} LINESTRING (7.41381 43.73258, 7.41388 43.73266...
node/7799514898 {'title': 'Cino Bar', 'store': 'kiosk'} POINT (7.42702 43.73118)
manner/143214794 {'constructing': 'sure'} POLYGON ((7.42788 43.74437, 7.42786 43.74437, ...
... ... ...
manner/161882794 {'freeway': 'secondary', 'lanes': '2', 'lit': ... LINESTRING (7.42142 43.73656, 7.42147 43.73662...
manner/1082330089 {'freeway': 'steps', 'incline': 'down'} LINESTRING (7.42438 43.72990, 7.42440 43.72988...
manner/834313093 {'freeway': 'service', 'smoothness': 'intermed... LINESTRING (7.42709 43.73256, 7.42708 43.73262...
manner/94399451 {'addr:nation': 'MC', 'constructing': 'sure'} POLYGON ((7.41678 43.73699, 7.41661 43.73689, ...
node/2462515787 {'crossing': 'uncontrolled', 'freeway': 'cross... POINT (7.41656 43.73208)

[7940 rows x 2 columns]
>>> qosm.convert_pbf_to_gpq('monaco-latest.osm.pbf')
PosixPath('recordsdata/monaco-latest_nofilter_noclip_compact.geoparquet')

GitHub – kraina-ai/quackosm: QuackOSM: an open-source Python and CLI software for studying OpenStreetMap PBF recordsdata utilizing DuckDB

Disclaimer

I’m the creator of the QuackOSM library.

You may attain me right here:
https://www.linkedin.com/in/raczyckikamil/
https://github.com/raczeq

stat?event=post


Easy methods to learn OSM information with DuckDB was initially revealed in In direction of Information Science on Medium, the place persons are persevering with the dialog by highlighting and responding to this story.



Supply hyperlink

LEAVE A REPLY

Please enter your comment!
Please enter your name here