vibespatial.io.osm_gpu

GPU-accelerated reader for OpenStreetMap PBF files.

Extracts DenseNodes (Points), Ways (LineStrings/Polygons), and Relations (MultiPolygons) from OSM PBF files using a hybrid CPU/GPU pipeline:

Nodes pipeline:

  1. Block index parsing (CPU) – parse BlobHeader/Blob pairs sequentially to build a block index with offsets and types.

  2. Zlib decompression (CPU) – decompress each OSMData block using Python’s zlib module.

  3. Protobuf field extraction (CPU) – locate DenseNodes fields (id, lat, lon deltas) within each PrimitiveBlock, producing varint byte ranges.

  4. Varint decoding (GPU, Tier 1 NVRTC) – parallel decode of protobuf varints to int64 arrays. Each thread decodes one varint at a known position, handling the 7-bit-per-byte continuation encoding.

  5. Delta decoding (GPU, Tier 2 CuPy) – cumulative sum over delta-encoded arrays to recover absolute node IDs, latitudes, and longitudes.

  6. Coordinate scaling (GPU, Tier 2 CuPy) – convert from nanodegree integer representation to fp64 degrees.

  7. Assembly – build device-resident Point OwnedGeometryArray.

Ways pipeline:

  1. Way field extraction (CPU) – parse Way messages from each PrimitiveGroup to extract way IDs, delta-encoded node refs, and tag key/value pairs. Delta decoding of refs happens on CPU since each Way has a small number of refs.

  2. Node lookup table (GPU, Tier 2 CuPy) – sort extracted node IDs by value to enable binary search.

  3. Coordinate gathering (GPU, Tier 1 NVRTC) – for each node ref in each Way, binary-search the sorted node table to resolve lon/lat coordinates.

  4. Way classification (GPU, Tier 2 CuPy) – classify closed Ways (first ref == last ref) as Polygon, open Ways as LineString.

  5. Assembly – build separate device-resident LineString and Polygon OwnedGeometryArrays, then combine into a mixed OwnedGeometryArray.

Relations pipeline (MultiPolygon assembly):

  1. Relation field extraction (CPU) – parse Relation messages from each PrimitiveGroup to extract relation IDs, member IDs/types/roles.

  2. Way lookup (CPU) – build a Way ID -> node refs dict from parsed Way data so Relations can resolve their Way members.

  3. Way chaining (CPU, per-relation) – for each MultiPolygon relation, chain outer and inner Way members into closed rings by endpoint matching.

  4. Coordinate gathering (GPU, Tier 1 NVRTC) – reuse the existing binary-search kernel to resolve all ring node refs to coordinates.

  5. MultiPolygon assembly – build device-resident MultiPolygon OwnedGeometryArray with geometry/part/ring offset hierarchy.

Tier classification (ADR-0033):
  • Block parsing: CPU (sequential metadata, not parallelizable)

  • Decompression: CPU (zlib)

  • Protobuf field location: CPU (sequential proto traversal, small data)

  • Varint decoding: Tier 1 (custom NVRTC – binary format-specific)

  • Way coordinate gathering: Tier 1 (custom NVRTC – binary search)

  • Delta decode (cumsum): Tier 2 (CuPy cumsum)

  • Coordinate scaling: Tier 2 (CuPy element-wise)

  • Node lookup sort: Tier 2 (CuPy argsort)

  • Way classification: Tier 2 (CuPy element-wise comparison)

  • Offset construction: Tier 2 (CuPy)

  • Way chaining: CPU (small graph traversal, <10 ways per ring)

  • Relation coordinate gathering: Tier 1 (reuse NVRTC binary search)

Precision (ADR-0002):

The varint decode and coordinate gather kernels are integer-only or use fp64 storage directly – no floating-point computation that benefits from precision dispatch. Coordinate storage is always fp64 per ADR-0002 (same rationale as csv_gpu.py, kml_gpu.py, and all other IO readers).

Attributes

Classes

BlockInfo

Metadata for one BlobHeader + Blob pair in the PBF file.

DenseNodesBlock

Extracted varint byte ranges for one DenseNodes within a PrimitiveBlock.

WayBlock

Extracted data for Ways within one PrimitiveBlock.

RelationMember

A single member of an OSM Relation.

RelationBlock

Extracted data for Relations within one PrimitiveBlock.

OsmGpuResult

Result of GPU-accelerated OSM PBF reading.

Functions

read_osm_pbf_nodes(→ OsmGpuResult)

Extract all nodes from an OSM PBF file as Point geometries.

read_osm_pbf(→ OsmGpuResult)

Extract nodes, ways, and relations from an OSM PBF file.

Module Contents

vibespatial.io.osm_gpu.cp = None
vibespatial.io.osm_gpu.logger
class vibespatial.io.osm_gpu.BlockInfo

Metadata for one BlobHeader + Blob pair in the PBF file.

offset: int
blob_header_size: int
blob_size: int
block_type: str
class vibespatial.io.osm_gpu.DenseNodesBlock

Extracted varint byte ranges for one DenseNodes within a PrimitiveBlock.

id_bytes: bytes
lat_bytes: bytes
lon_bytes: bytes
granularity: int
lat_offset: int
lon_offset: int
keys_vals_bytes: bytes = b''
stringtable: list[bytes] | None = None
class vibespatial.io.osm_gpu.WayBlock

Extracted data for Ways within one PrimitiveBlock.

way_ids: list[int]
refs_per_way: list[list[int]]
tag_keys_per_way: list[list[int]]
tag_vals_per_way: list[list[int]]
stringtable: list[bytes]
class vibespatial.io.osm_gpu.RelationMember

A single member of an OSM Relation.

member_id: int
member_type: int
role: str
class vibespatial.io.osm_gpu.RelationBlock

Extracted data for Relations within one PrimitiveBlock.

relation_ids: list[int]
members_per_relation: list[list[RelationMember]]
stringtable: list[bytes]
granularity: int
lat_offset: int
lon_offset: int
tag_keys_per_relation: list[list[int]] | None = None
tag_vals_per_relation: list[list[int]] | None = None
class vibespatial.io.osm_gpu.OsmGpuResult

Result of GPU-accelerated OSM PBF reading.

Attributes

nodes

Point OwnedGeometryArray containing all extracted nodes. None if no nodes were found.

node_ids

Device-resident int64 array of OSM node IDs, parallel to the geometry array. None if no nodes were found.

node_tags

Per-node tag dicts (host-resident strings). None if no nodes were found or no tags present.

n_nodes

Total number of nodes extracted.

ways

Mixed LineString/Polygon OwnedGeometryArray containing all extracted Ways. None if no ways were found or no nodes available for coordinate resolution.

way_ids

Device-resident int64 array of OSM Way IDs, parallel to the ways geometry array. None if no ways were found.

way_tags

Per-way tag dicts (host-resident strings). None if no ways were found or no tags present.

n_ways

Total number of ways extracted.

relations

MultiPolygon OwnedGeometryArray from Relation assembly. None if no multipolygon relations were found.

relation_ids

Device-resident int64 array of OSM Relation IDs, parallel to the relations geometry array. None if no relations found.

relation_tags

Per-relation tag dicts (host-resident strings). None if no relations found or no tags present.

n_relations

Total number of multipolygon relations extracted.

nodes: vibespatial.geometry.owned.OwnedGeometryArray | None
node_ids: cupy.ndarray | None
n_nodes: int
node_tags: list[dict[str, str]] | None = None
ways: vibespatial.geometry.owned.OwnedGeometryArray | None = None
way_ids: cupy.ndarray | None = None
way_tags: list[dict[str, str]] | None = None
n_ways: int = 0
relations: vibespatial.geometry.owned.OwnedGeometryArray | None = None
relation_ids: cupy.ndarray | None = None
relation_tags: list[dict[str, str]] | None = None
n_relations: int = 0
vibespatial.io.osm_gpu.read_osm_pbf_nodes(path: str | pathlib.Path) OsmGpuResult

Extract all nodes from an OSM PBF file as Point geometries.

Uses a hybrid CPU/GPU pipeline: - CPU: parse block structure, decompress zlib, locate protobuf fields - GPU: decode varints, delta-decode via cumsum, scale coordinates

Parameters

path

Path to the .osm.pbf file.

Returns

OsmGpuResult

Contains a device-resident Point OwnedGeometryArray (nodes), device-resident int64 node IDs (node_ids), and total count. Way fields are None / 0.

vibespatial.io.osm_gpu.read_osm_pbf(path: str | pathlib.Path, *, tags: bool | str = 'ways', geometry_only: bool = False, layer=None) OsmGpuResult

Extract nodes, ways, and relations from an OSM PBF file.

Uses a hybrid CPU/GPU streaming pipeline to extract: - Nodes as Point geometries (same pipeline as read_osm_pbf_nodes) - Ways as LineString (open) or Polygon (closed ring) geometries - Relations as MultiPolygon geometries (assembled from Way members)

Blocks are processed in batches of _STREAM_BATCH_SIZE to bound peak host memory. A 10 GB PBF file that would previously decompress to 40-80 GB in host RAM now peaks at ~128 MB per batch.

Way and relation coordinate resolution uses a GPU binary-search kernel against a sorted node lookup table built from the extracted DenseNodes.

Parameters

path

Path to the .osm.pbf file.

tagsbool or str, default "ways"

Controls tag/attribute extraction. Tags are host-resident Python dicts and can consume significant memory for large files.

  • True — extract tags for all elements (nodes, ways, relations). Warning: for planet-scale files, node tags alone can exceed 100 GB of host memory since most of the ~8 billion OSM nodes carry per-object Python dict overhead even when empty.

  • "ways" (default) — extract tags for ways and relations only. Node tags are skipped. This is the recommended setting for most workflows since node tags are rarely needed (nodes are primarily coordinate waypoints for ways).

  • False — skip all tag extraction. Fastest and lowest memory.

geometry_onlybool, default False

If True, skip all tag extraction AND OSM ID extraction. Returns only device-resident geometry with no host-side attributes. This is the fastest mode, ideal for visualization or spatial analysis where element metadata is not needed.

layerstr, optional

Limit the returned surface to one OSM layer. Supported values are "points", "lines", "ways", "multipolygons", "relations", "multilinestrings", "other_relations", and "all". Unsupported relation layers currently return an empty result instead of triggering extra parsing work.

Returns

OsmGpuResult

Contains device-resident Point, LineString, Polygon, and MultiPolygon OwnedGeometryArrays with corresponding OSM IDs (unless geometry_only=True).