GeoPandas Reprojection¶
Reproject a GeoDataFrame on the GPU using vibeProj. Works on CPU too –
transform_chunked() falls back to NumPy transparently when CuPy is not
installed.
Quick start: reproject_geodataframe()¶
from vibeproj.compat import reproject_geodataframe
gdf_utm = reproject_geodataframe(gdf, "EPSG:32631")
This is the one-liner path. It handles CRS extraction from the source GeoDataFrame, 2D/3D geometries, and bulk coordinate transform in a single call. The result is a new GeoDataFrame with the target CRS set.
You can pass any Transformer.from_crs() keyword through **kw:
# Cross-datum with explicit epoch for sub-decimeter accuracy
gdf_etrs = reproject_geodataframe(
gdf,
"EPSG:4258",
datum_shift="accurate",
epoch=2024.0,
chunk_size=2_000_000,
)
To reuse the transformer across multiple calls (avoids repeated CRS resolution and kernel compilation):
from vibeproj import Transformer
t = Transformer.from_crs("EPSG:4326", "EPSG:32631")
gdf_a_utm = reproject_geodataframe(gdf_a, "EPSG:32631", transformer=t)
gdf_b_utm = reproject_geodataframe(gdf_b, "EPSG:32631", transformer=t)
Manual recipe¶
For full control over the coordinate pipeline, use shapely.get_coordinates()
and shapely.set_coordinates() directly. This is roughly what
reproject_geodataframe() does internally.
import numpy as np
import shapely
from vibeproj import Transformer
# 1. Build transformer once
t = Transformer.from_crs(str(gdf.crs), "EPSG:32631")
# 2. Extract all coordinates as a single (N, 2) or (N, 3) array
geom_arr = gdf.geometry.values
has_z = bool(shapely.has_z(geom_arr).any())
coords = shapely.get_coordinates(geom_arr, include_z=has_z)
# 3. Bulk transform
x, y = coords[:, 0], coords[:, 1]
if has_z:
rx, ry, rz = t.transform_chunked(x, y, z=coords[:, 2])
new_coords = np.column_stack([rx, ry, rz])
else:
rx, ry = t.transform_chunked(x, y)
new_coords = np.column_stack([rx, ry])
# 4. Reconstruct geometries and build result
new_geoms = shapely.set_coordinates(geom_arr.copy(), new_coords)
result = gdf.copy()
result[gdf.geometry.name] = new_geoms
result = result.set_geometry(gdf.geometry.name).set_crs("EPSG:32631", allow_override=True)
The key insight is that shapely.get_coordinates() flattens every vertex
from every geometry in the GeoDataFrame into one contiguous array. This
turns a GeoDataFrame with 1M polygons (50 vertices each) into a single
50M-row coordinate array – exactly the kind of large, embarrassingly
parallel workload that saturates the GPU.
Performance comparison¶
All numbers use Transverse Mercator (EPSG:4326 to EPSG:32631). GPU timings
include host-to-device and device-to-host transfer via transform_chunked().
Scenario |
GPU |
CPU ( |
Speedup |
|---|---|---|---|
1M-geom GeoDataFrame (50 verts/geom) |
~40 ms |
~7 s |
175x |
1M tmerc (optimized chunked pipeline) |
~1.3 ms |
~139 ms |
107x |
1M tmerc (kernel only, no transfer) |
~0.49 ms |
~139 ms |
284x |
The gdf.to_crs() path goes through pyproj, which processes coordinates
serially on the CPU. vibeProj extracts all coordinates in one shot, transfers
them to the GPU, runs a single fused kernel, and writes the results back.
Transfer overhead dominates at small scales – the GPU breakeven for
Transverse Mercator is roughly 100 coordinate pairs on PCIe 4.0.
On CPU (no CuPy), vibeProj’s transform_chunked() falls back to vectorized
NumPy operations. This is faster than pyproj for large arrays but slower
than the GPU path. The chunk_size parameter becomes a no-op.
GeoArrow future¶
GeoPandas is moving toward native GeoArrow storage for geometry columns
(geopandas >= 1.0 with use_pyarrow=True). When GeoArrow becomes the
default storage backend, the coordinate extraction step
(shapely.get_coordinates() / set_coordinates()) can be replaced with
direct access to the underlying Arrow coordinate buffers – eliminating the
host-side copy entirely.
Combined with transform_buffers() for zero-copy GPU dispatch, this will
remove the last remaining memory allocation from the reprojection path:
Arrow buffer on host, pinned transfer to device, in-place kernel, pinned
transfer back, Arrow buffer on host. No intermediate arrays.