vibespatial.io.osm_gpu¶
GPU-accelerated reader for OpenStreetMap PBF files.
Extracts DenseNodes (Points), Ways (LineStrings/Polygons), and Relations (MultiPolygons) from OSM PBF files using a hybrid CPU/GPU pipeline:
Nodes pipeline:
Block index parsing (CPU) – parse BlobHeader/Blob pairs sequentially to build a block index with offsets and types.
Zlib decompression (CPU) – decompress each OSMData block using Python’s zlib module.
Protobuf field extraction (CPU) – locate DenseNodes fields (id, lat, lon deltas) within each PrimitiveBlock, producing varint byte ranges.
Varint decoding (GPU, Tier 1 NVRTC) – parallel decode of protobuf varints to int64 arrays. Each thread decodes one varint at a known position, handling the 7-bit-per-byte continuation encoding.
Delta decoding (GPU, Tier 2 CuPy) – cumulative sum over delta-encoded arrays to recover absolute node IDs, latitudes, and longitudes.
Coordinate scaling (GPU, Tier 2 CuPy) – convert from nanodegree integer representation to fp64 degrees.
Assembly – build device-resident Point OwnedGeometryArray.
Ways pipeline:
Way field extraction (CPU) – parse Way messages from each PrimitiveGroup to extract way IDs, delta-encoded node refs, and tag key/value pairs. Delta decoding of refs happens on CPU since each Way has a small number of refs.
Node lookup table (GPU, Tier 2 CuPy) – sort extracted node IDs by value to enable binary search.
Coordinate gathering (GPU, Tier 1 NVRTC) – for each node ref in each Way, binary-search the sorted node table to resolve lon/lat coordinates.
Way classification (GPU, Tier 2 CuPy) – classify closed Ways (first ref == last ref) as Polygon, open Ways as LineString.
Assembly – build separate device-resident LineString and Polygon OwnedGeometryArrays, then combine into a mixed OwnedGeometryArray.
Relations pipeline (MultiPolygon assembly):
Relation field extraction (CPU) – parse Relation messages from each PrimitiveGroup to extract relation IDs, member IDs/types/roles.
Way lookup (CPU) – build a Way ID -> node refs dict from parsed Way data so Relations can resolve their Way members.
Way chaining (CPU, per-relation) – for each MultiPolygon relation, chain outer and inner Way members into closed rings by endpoint matching.
Coordinate gathering (GPU, Tier 1 NVRTC) – reuse the existing binary-search kernel to resolve all ring node refs to coordinates.
MultiPolygon assembly – build device-resident MultiPolygon OwnedGeometryArray with geometry/part/ring offset hierarchy.
- Tier classification (ADR-0033):
Block parsing: CPU (sequential metadata, not parallelizable)
Decompression: CPU (zlib)
Protobuf field location: CPU (sequential proto traversal, small data)
Varint decoding: Tier 1 (custom NVRTC – binary format-specific)
Way coordinate gathering: Tier 1 (custom NVRTC – binary search)
Delta decode (cumsum): Tier 2 (CuPy cumsum)
Coordinate scaling: Tier 2 (CuPy element-wise)
Node lookup sort: Tier 2 (CuPy argsort)
Way classification: Tier 2 (CuPy element-wise comparison)
Offset construction: Tier 2 (CuPy)
Way chaining: CPU (small graph traversal, <10 ways per ring)
Relation coordinate gathering: Tier 1 (reuse NVRTC binary search)
- Precision (ADR-0002):
The varint decode and coordinate gather kernels are integer-only or use fp64 storage directly – no floating-point computation that benefits from precision dispatch. Coordinate storage is always fp64 per ADR-0002 (same rationale as csv_gpu.py, kml_gpu.py, and all other IO readers).
Attributes¶
Classes¶
Metadata for one BlobHeader + Blob pair in the PBF file. |
|
Extracted varint byte ranges for one DenseNodes within a PrimitiveBlock. |
|
Extracted data for Ways within one PrimitiveBlock. |
|
A single member of an OSM Relation. |
|
Extracted data for Relations within one PrimitiveBlock. |
|
Result of GPU-accelerated OSM PBF reading. |
Functions¶
|
Extract all nodes from an OSM PBF file as Point geometries. |
|
Extract nodes, ways, and relations from an OSM PBF file. |
Module Contents¶
- vibespatial.io.osm_gpu.cp = None¶
- vibespatial.io.osm_gpu.logger¶
- class vibespatial.io.osm_gpu.BlockInfo¶
Metadata for one BlobHeader + Blob pair in the PBF file.
- offset: int¶
- blob_header_size: int¶
- blob_size: int¶
- block_type: str¶
- class vibespatial.io.osm_gpu.DenseNodesBlock¶
Extracted varint byte ranges for one DenseNodes within a PrimitiveBlock.
- id_bytes: bytes¶
- lat_bytes: bytes¶
- lon_bytes: bytes¶
- granularity: int¶
- lat_offset: int¶
- lon_offset: int¶
- keys_vals_bytes: bytes = b''¶
- stringtable: list[bytes] | None = None¶
- class vibespatial.io.osm_gpu.WayBlock¶
Extracted data for Ways within one PrimitiveBlock.
- way_ids: list[int]¶
- refs_per_way: list[list[int]]¶
- tag_keys_per_way: list[list[int]]¶
- tag_vals_per_way: list[list[int]]¶
- stringtable: list[bytes]¶
- class vibespatial.io.osm_gpu.RelationMember¶
A single member of an OSM Relation.
- member_id: int¶
- member_type: int¶
- role: str¶
- class vibespatial.io.osm_gpu.RelationBlock¶
Extracted data for Relations within one PrimitiveBlock.
- relation_ids: list[int]¶
- members_per_relation: list[list[RelationMember]]¶
- stringtable: list[bytes]¶
- granularity: int¶
- lat_offset: int¶
- lon_offset: int¶
- tag_keys_per_relation: list[list[int]] | None = None¶
- tag_vals_per_relation: list[list[int]] | None = None¶
- class vibespatial.io.osm_gpu.OsmGpuResult¶
Result of GPU-accelerated OSM PBF reading.
Attributes¶
- nodes
Point OwnedGeometryArray containing all extracted nodes.
Noneif no nodes were found.- node_ids
Device-resident int64 array of OSM node IDs, parallel to the geometry array.
Noneif no nodes were found.- node_tags
Per-node tag dicts (host-resident strings).
Noneif no nodes were found or no tags present.- n_nodes
Total number of nodes extracted.
- ways
Mixed LineString/Polygon OwnedGeometryArray containing all extracted Ways.
Noneif no ways were found or no nodes available for coordinate resolution.- way_ids
Device-resident int64 array of OSM Way IDs, parallel to the ways geometry array.
Noneif no ways were found.- way_tags
Per-way tag dicts (host-resident strings).
Noneif no ways were found or no tags present.- n_ways
Total number of ways extracted.
- relations
MultiPolygon OwnedGeometryArray from Relation assembly.
Noneif no multipolygon relations were found.- relation_ids
Device-resident int64 array of OSM Relation IDs, parallel to the relations geometry array.
Noneif no relations found.- relation_tags
Per-relation tag dicts (host-resident strings).
Noneif no relations found or no tags present.- n_relations
Total number of multipolygon relations extracted.
- nodes: vibespatial.geometry.owned.OwnedGeometryArray | None¶
- node_ids: cupy.ndarray | None¶
- n_nodes: int¶
- node_tags: list[dict[str, str]] | None = None¶
- ways: vibespatial.geometry.owned.OwnedGeometryArray | None = None¶
- way_ids: cupy.ndarray | None = None¶
- way_tags: list[dict[str, str]] | None = None¶
- n_ways: int = 0¶
- relations: vibespatial.geometry.owned.OwnedGeometryArray | None = None¶
- relation_ids: cupy.ndarray | None = None¶
- relation_tags: list[dict[str, str]] | None = None¶
- n_relations: int = 0¶
- vibespatial.io.osm_gpu.read_osm_pbf_nodes(path: str | pathlib.Path) OsmGpuResult¶
Extract all nodes from an OSM PBF file as Point geometries.
Uses a hybrid CPU/GPU pipeline: - CPU: parse block structure, decompress zlib, locate protobuf fields - GPU: decode varints, delta-decode via cumsum, scale coordinates
Parameters¶
- path
Path to the
.osm.pbffile.
Returns¶
- OsmGpuResult
Contains a device-resident Point OwnedGeometryArray (
nodes), device-resident int64 node IDs (node_ids), and total count. Way fields areNone/ 0.
- vibespatial.io.osm_gpu.read_osm_pbf(path: str | pathlib.Path, *, tags: bool | str = 'ways', geometry_only: bool = False, layer=None) OsmGpuResult¶
Extract nodes, ways, and relations from an OSM PBF file.
Uses a hybrid CPU/GPU streaming pipeline to extract: - Nodes as Point geometries (same pipeline as
read_osm_pbf_nodes) - Ways as LineString (open) or Polygon (closed ring) geometries - Relations as MultiPolygon geometries (assembled from Way members)Blocks are processed in batches of
_STREAM_BATCH_SIZEto bound peak host memory. A 10 GB PBF file that would previously decompress to 40-80 GB in host RAM now peaks at ~128 MB per batch.Way and relation coordinate resolution uses a GPU binary-search kernel against a sorted node lookup table built from the extracted DenseNodes.
Parameters¶
- path
Path to the
.osm.pbffile.- tagsbool or str, default
"ways" Controls tag/attribute extraction. Tags are host-resident Python dicts and can consume significant memory for large files.
True— extract tags for all elements (nodes, ways, relations). Warning: for planet-scale files, node tags alone can exceed 100 GB of host memory since most of the ~8 billion OSM nodes carry per-object Python dict overhead even when empty."ways"(default) — extract tags for ways and relations only. Node tags are skipped. This is the recommended setting for most workflows since node tags are rarely needed (nodes are primarily coordinate waypoints for ways).False— skip all tag extraction. Fastest and lowest memory.
- geometry_onlybool, default False
If True, skip all tag extraction AND OSM ID extraction. Returns only device-resident geometry with no host-side attributes. This is the fastest mode, ideal for visualization or spatial analysis where element metadata is not needed.
- layerstr, optional
Limit the returned surface to one OSM layer. Supported values are
"points","lines","ways","multipolygons","relations","multilinestrings","other_relations", and"all". Unsupported relation layers currently return an empty result instead of triggering extra parsing work.
Returns¶
- OsmGpuResult
Contains device-resident Point, LineString, Polygon, and MultiPolygon OwnedGeometryArrays with corresponding OSM IDs (unless
geometry_only=True).