ADR-0035: Geodesic Distance Kernel and CRS-Aware Dispatch¶
Status Note¶
This ADR is deferred and not targeted for implementation. The current warn-and-compute-Euclidean behavior matches upstream GeoPandas and is sufficient for the project’s current scope. Users who need geodesic accuracy should reproject to a projected CRS before running distance operations. This ADR is preserved as a design reference if geodesic GPU distance becomes a priority in the future.
Context¶
All GPU distance kernels in vibeSpatial compute planar Euclidean
distance — sqrt(dx*dx + dy*dy) in coordinate space. This is
correct for projected CRS (UTM, State Plane, etc.) but produces
meaningless degree-based values for geographic CRS like EPSG:4326
(WGS 84), where one degree of longitude is ~111 km at the equator but
~0 km at the poles.
The current safeguard is check_geographic_crs() on
DeviceGeometryArray and the vendored GeometryArray, which emits a
UserWarning but does not block execution or switch distance
formulas. This matches upstream GeoPandas behavior: warn, then compute
planar distance anyway. The GPU query engine in spatial_query.py is
entirely CRS-unaware — no checks, no warnings.
What exists today¶
Layer |
CRS awareness |
Action on geographic CRS |
|---|---|---|
|
|
Warns on |
Vendored |
|
Warns on |
|
None |
No CRS metadata at all |
|
None |
No checks on |
NVRTC distance kernels |
None |
Pure Euclidean always |
|
Warns via |
Warns, then uses Euclidean |
This means a user who calls sjoin_nearest on WGS 84 data gets a
warning from the vendored layer but then the GPU distance refinement
silently computes Euclidean distances in degree-space — which can
produce incorrect orderings (nearer in degrees ≠ nearer on the
ellipsoid) and incorrect max_distance filtering.
Why this matters¶
Incorrect nearest-neighbor results. At mid-latitudes, 1° of longitude ≈ 79 km but 1° of latitude ≈ 111 km. Euclidean distance in degree-space distorts E-W vs N-S distances by up to 40%. This can change which geometry is “nearest.”
Incorrect dwithin filtering. A
max_distanceof 1000 (meters) applied in degree-space is nonsensical. Users who forget to reproject get silently wrong results with no GPU-level guard.GeoPandas compatibility. GeoPandas also only warns, so matching upstream behavior is technically correct. But vibeSpatial’s mission is to be better, not just compatible.
Vincenty vs Haversine vs reproject-first¶
Three approaches exist for geodesic distance:
Formula |
Accuracy |
Cost |
GPU-friendly? |
|---|---|---|---|
Haversine |
~0.3% error (spherical assumption) |
Cheap: 2 |
Yes — branchless, no iteration |
Vincenty |
~0.01 mm on WGS 84 ellipsoid |
Moderate: iterative (typically 3–6 iterations, max ~20) |
Yes — convergent loop, all fp64 math available on GPU |
Karney (GeographicLib) |
sub-nanometer |
Expensive: complex series expansion |
Possible but high register pressure |
Reproject-first |
Depends on chosen projection |
One-time cost + planar distance |
Avoids kernel changes; shifts work to CRS transform |
Relationship to ADR-0033¶
Per ADR-0033’s decision tree:
Is the inner loop geometry-specific? → Yes → Tier 1 (custom NVRTC kernel)
Geodesic distance is irreducibly geometry-specific: the formula operates on coordinate pairs with trigonometric functions on an ellipsoid model. There is no CCCL or CuPy equivalent. A Vincenty or Haversine kernel is squarely Tier 1: Custom NVRTC.
The iterative nature of Vincenty (variable loop count per pair) raises a warp divergence concern, but the convergence is fast and bounded (max ~20 iterations, typically 3–6). This is acceptable warp divergence — it’s far less than the ring/part traversal loops already present in existing Tier 1 polygon kernels.
The supporting pipeline stages (candidate generation, compaction, nearest-reduce) remain at their current tiers per ADR-0033: CCCL for scans/compaction, CuPy for element-wise, etc. Only the distance formula changes.
Precision implications¶
Per docs/architecture/precision.md, distance is a metric kernel
class. Vincenty requires fp64 — the iterative convergence and
trigonometric functions lose too much precision in fp32. This aligns
with the existing policy: metric kernels on consumer GPUs use staged
fp32 with compensation, but Vincenty should be fp64-only because:
The WGS 84 ellipsoid parameters (a = 6378137.0, f = 1/298.257223563) require fp64 to represent without loss.
The iterative loop’s convergence criterion (~1e-12 radians) is below fp32 epsilon.
Haversine could run in fp32 if the accuracy tradeoff (0.3% error) is acceptable for a coarse-distance fast path.
Decision¶
Option A: Reproject-first (recommended default, no new kernel)¶
Policy: For distance-dependent operations (dwithin, nearest,
sjoin_nearest, .distance()), if the CRS is geographic:
Auto-estimate a suitable projected CRS via
estimate_utm_crs()(already available onGeoSeries).Reproject both geometry columns to the estimated projection.
Compute planar Euclidean distance using existing GPU kernels.
Emit a diagnostic event recording the auto-reprojection.
Pros:
Zero new NVRTC kernel code.
Reuses the battle-tested Euclidean distance pipeline.
Consistent with “reproject before operating” best practice.
Works for all geometry types (points, lines, polygons) without per-type geodesic kernels.
Cons:
Reprojection is currently CPU-only (
pyproj). For large datasets this becomes a CPU bottleneck before the GPU distance kernel runs.UTM zone estimation can be wrong for datasets spanning multiple zones.
Adds a silent auto-reproject step that changes coordinates — some users may not expect this.
Memory: requires a temporary reprojected copy of both geometry arrays.
Option B: Haversine GPU kernel (fast geodesic for points)¶
Policy: Add a Haversine great-circle distance NVRTC kernel for point-to-point distance. Use it when:
Both geometry columns contain only points.
CRS is geographic (EPSG:4326 or similar).
The user has not already reprojected.
Fall back to Euclidean for non-point geometries (lines, polygons) with the existing warning.
Kernel (Tier 1 per ADR-0033):
extern "C" __global__ void haversine_distance(
const double* lat1, const double* lon1,
const double* lat2, const double* lon2,
double* out_distances,
int n,
double earth_radius // 6371008.8 for mean radius
) {
const int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= n) return;
const double d_lat = (lat2[i] - lat1[i]) * M_PI / 180.0;
const double d_lon = (lon2[i] - lon1[i]) * M_PI / 180.0;
const double la1 = lat1[i] * M_PI / 180.0;
const double la2 = lat2[i] * M_PI / 180.0;
const double a = sin(d_lat / 2.0) * sin(d_lat / 2.0)
+ cos(la1) * cos(la2)
* sin(d_lon / 2.0) * sin(d_lon / 2.0);
out_distances[i] = 2.0 * earth_radius * asin(sqrt(a));
}
Pros:
No host round-trip. Geodesic distance stays on device.
Cheap: ~10 FP64 ops per pair. No iteration.
Good enough for most spatial join / nearest-neighbor use cases.
fp32-feasible for coarse-filter stages (0.3% error acceptable).
Cons:
Points only. No point-to-line or point-to-polygon geodesic distance without significant additional work (project each segment to a great circle arc).
0.3% error near the poles and for very long distances.
Need to handle x/y vs lon/lat axis ordering per CRS metadata.
Option C: Vincenty GPU kernel (high-accuracy geodesic)¶
Policy: Add an iterative Vincenty inverse-formula NVRTC kernel. Use it as the distance function when CRS is geographic and the user requests meter-accurate results (or as the default geodesic path).
Kernel (Tier 1 per ADR-0033):
extern "C" __global__ void vincenty_distance(
const double* lat1, const double* lon1,
const double* lat2, const double* lon2,
double* out_distances,
int n,
double a, // semi-major axis (6378137.0 for WGS 84)
double f // flattening (1/298.257223563 for WGS 84)
) {
const int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= n) return;
const double b = a * (1.0 - f);
const double U1 = atan((1.0 - f) * tan(lat1[i] * M_PI / 180.0));
const double U2 = atan((1.0 - f) * tan(lat2[i] * M_PI / 180.0));
const double L = (lon2[i] - lon1[i]) * M_PI / 180.0;
const double sinU1 = sin(U1), cosU1 = cos(U1);
const double sinU2 = sin(U2), cosU2 = cos(U2);
double lambda = L, prev_lambda;
double sin_sigma, cos_sigma, sigma, sin_alpha, cos2_alpha, cos_2sigma_m;
for (int iter = 0; iter < 20; ++iter) {
const double sinL = sin(lambda), cosL = cos(lambda);
sin_sigma = sqrt(
(cosU2 * sinL) * (cosU2 * sinL) +
(cosU1 * sinU2 - sinU1 * cosU2 * cosL) *
(cosU1 * sinU2 - sinU1 * cosU2 * cosL));
if (sin_sigma < 1e-30) { out_distances[i] = 0.0; return; }
cos_sigma = sinU1 * sinU2 + cosU1 * cosU2 * cosL;
sigma = atan2(sin_sigma, cos_sigma);
sin_alpha = cosU1 * cosU2 * sinL / sin_sigma;
cos2_alpha = 1.0 - sin_alpha * sin_alpha;
cos_2sigma_m = (cos2_alpha > 1e-30)
? cos_sigma - 2.0 * sinU1 * sinU2 / cos2_alpha
: 0.0;
const double C = f / 16.0 * cos2_alpha * (4.0 + f * (4.0 - 3.0 * cos2_alpha));
prev_lambda = lambda;
lambda = L + (1.0 - C) * f * sin_alpha *
(sigma + C * sin_sigma *
(cos_2sigma_m + C * cos_sigma *
(-1.0 + 2.0 * cos_2sigma_m * cos_2sigma_m)));
if (fabs(lambda - prev_lambda) < 1e-12) break;
}
const double u_sq = cos2_alpha * (a * a - b * b) / (b * b);
const double A_coeff = 1.0 + u_sq / 16384.0 *
(4096.0 + u_sq * (-768.0 + u_sq * (320.0 - 175.0 * u_sq)));
const double B_coeff = u_sq / 1024.0 *
(256.0 + u_sq * (-128.0 + u_sq * (74.0 - 47.0 * u_sq)));
const double delta_sigma = B_coeff * sin_sigma *
(cos_2sigma_m + B_coeff / 4.0 *
(cos_sigma * (-1.0 + 2.0 * cos_2sigma_m * cos_2sigma_m)
- B_coeff / 6.0 * cos_2sigma_m *
(-3.0 + 4.0 * sin_sigma * sin_sigma) *
(-3.0 + 4.0 * cos_2sigma_m * cos_2sigma_m)));
out_distances[i] = b * A_coeff * (sigma - delta_sigma);
}
Pros:
Sub-millimeter accuracy on WGS 84.
No host round-trip. All on device.
Convergence is fast (3–6 iterations typical). Warp divergence is bounded and modest — less than polygon ring traversal loops.
fp64-only is acceptable: metric kernel class, and existing distance kernels already use fp64 storage.
Cons:
Points only (same as Haversine). Extending to segment-to-segment geodesic distance is a research problem, not an engineering task.
~6× more FP64 ops per pair than Haversine (trig + iteration).
Antipodal points can cause non-convergence (need fallback to Haversine or special-case handling).
Register pressure: ~20+ fp64 registers per thread. Higher occupancy impact than Haversine.
fp64 throughput is 1/32 of fp32 on consumer GPUs. For point-heavy workloads this may still be memory-bound, but for small batches the compute cost is real.
Option D: Tiered geodesic (recommended — combine B + C)¶
Policy: Implement both Haversine (Option B) and Vincenty (Option C) as Tier 1 NVRTC kernels, selectable by a distance-mode parameter:
Mode |
Formula |
Accuracy |
Use case |
|---|---|---|---|
|
Existing kernels |
Exact in coordinate space |
Projected CRS |
|
Spherical great-circle |
~0.3% |
Fast nearest-neighbor screening, coarse dwithin |
|
Ellipsoidal iterative |
~0.01 mm |
Accurate distance reporting, tight dwithin |
|
CRS-aware selection |
Best available |
Picks |
When auto mode detects a geographic CRS:
Use Haversine for coarse-filter / candidate-generation stages where 0.3% error doesn’t change the result set.
Use Vincenty for final distance refinement and
distance_coloutput.Emit a diagnostic event showing which formula was used.
Recommendation¶
Option D (tiered geodesic) is the recommended path. It aligns with the project’s performance-first philosophy:
Haversine for throughput-sensitive stages (candidate MBR expansion, coarse nearest-neighbor filtering). Cheap enough that the GPU kernel isn’t the bottleneck.
Vincenty for accuracy-sensitive stages (final distance computation, distance column output, tight
max_distancefiltering). The iteration cost is acceptable because the final refinement stage operates on a much smaller candidate set.CRS flows into the kernel layer by threading CRS metadata through
OwnedGeometryArray(or as a dispatch-time parameter), not by modifying the kernel signatures to accept CRS objects.
Implementation Plan¶
Phase 1: CRS metadata threading (prerequisite)¶
Add an optional
crsfield toOwnedGeometryArray(or pass CRS as a parameter at dispatch call sites inspatial_query.py).Thread CRS from
DeviceGeometryArray._crsinto spatial query dispatch functions.No kernel changes yet — this only enables the dispatch layer to know whether the CRS is geographic.
Phase 2: Haversine point-to-point kernel¶
Add
haversine_distance.py(or extendpoint_distance.py) with the Haversine NVRTC kernel.Expose via the existing
compute_point_distance_gpuinterface with adistance_modeparameter.Integration test: point-to-point distance on EPSG:4326 data compared against
pyproj.Geod.inv()as the oracle.Wire into
spatial_query.pycandidate generation for geographic CRS: expand query bounds in meters (via Haversine) rather than degrees.
Phase 3: Vincenty point-to-point kernel¶
Add Vincenty kernel alongside Haversine in the same module.
Handle antipodal edge case (fall back to supplementary formula or Haversine for near-antipodal pairs).
Oracle test: compare against
pyproj.Geod.inv()for a corpus of point pairs including equatorial, polar, antipodal, and coincident cases.Wire into
spatial_query.pyfinal refinement fordwithinandnearestwhen CRS is geographic.
Phase 4: Auto-mode dispatch integration¶
In
spatial_query.py, detect geographic CRS at the top ofquery_spatial_index/nearest_spatial_index.Use Haversine for coarse stages, Vincenty for refine stages.
Convert
max_distancefrom meters to appropriate units for each stage (meters for geodesic kernels, coordinate units for bounds expansion heuristics).Emit runtime diagnostic events showing which distance formula was selected and why.
Phase 5: Non-point geodesic distance (future, deferred)¶
Geodesic distance for line-to-point or polygon-to-point on the ellipsoid is substantially harder:
“Distance to a geodesic arc” has no closed-form solution.
Would require iterative closest-point-on-arc computation per segment.
The performance shape is unclear — likely dominated by the iterative computation, not memory bandwidth.
Decision: Defer non-point geodesic distance. For non-point geometries on geographic CRS, the existing behavior (warn + Euclidean) or the Option A approach (auto-reproject to UTM, then Euclidean) should be offered as the fallback. This is an explicit, non-silent fallback per ADR-0013.
Risks¶
Warp divergence in Vincenty. Iteration counts vary per pair (3–20). Benchmarking is required to confirm the GPU kernel isn’t slower than CPU
pyproj.Geod.inv()for small batches (< 10K pairs). Mitigant: profile at ADR-0033 benchmark scales (10K, 100K, 1M).Axis ordering ambiguity. WGS 84 (EPSG:4326) has latitude-first axis order in the formal definition, but most spatial software stores coordinates as (longitude, latitude) = (x, y). The kernel must use the correct convention — read axis ordering from the CRS object or enforce (x=lon, y=lat) as a contract.
Antipodal non-convergence. Vincenty fails for exactly antipodal points. The kernel must detect this (sin_sigma ≈ 0 after max iterations) and fall back to a supplementary formula.
fp64 throughput on consumer GPUs. Vincenty is fp64-only. On RTX-class GPUs (1/32 fp64), the kernel may be compute-bound rather than memory-bound. If benchmarks show this, Haversine (which is fp32-feasible) becomes the default even for final refinement, with Vincenty available as an explicit opt-in.
CRS metadata plumbing scope creep. Adding CRS to
OwnedGeometryArraycould trigger wider refactoring. Mitigant: pass CRS as a dispatch-time parameter inspatial_query.pyrather than embedding it in the buffer schema.MBR expansion in geographic coordinates. The current MBR-based candidate generation expands bounds in coordinate units. For geographic CRS, expanding by
max_distancemeters requires a degree-to-meter conversion that varies with latitude. Haversine or a simpler approximation (meters / 111320) can handle this, but it adds complexity to the candidate generation stage.
Verification¶
# Phase 1: CRS threading
uv run pytest tests/test_spatial_query.py -k "crs" -q
# Phase 2: Haversine kernel
uv run pytest tests/test_geodesic_distance.py -q
uv run python scripts/benchmark_gpu_predicates.py # include geodesic cases
# Phase 3: Vincenty kernel
uv run pytest tests/test_geodesic_distance.py -k "vincenty" -q
# Phase 4: Auto-mode integration
env VIBESPATIAL_STRICT_NATIVE=1 uv run pytest tests/upstream/geopandas/tests/test_sindex.py -q
uv run pytest tests/upstream/geopandas/tools/tests/test_sjoin.py -k "nearest" -q
# End-to-end profile (mandatory per AGENTS.md)
uv run python scripts/benchmark_pipelines.py --suite full --repeat 1 --gpu-sparkline
References¶
ADR-0033: GPU Primitive Dispatch Rules (tier classification)
ADR-0002: Dual Precision Dispatch (fp64 requirement for metric kernels)
ADR-0013: Explicit CPU Fallback Events (non-silent fallback for non-point geodesic)
docs/architecture/precision.md: metric kernel class policydocs/implementation-order.mdPhase 6c: CRS policy (o17.6.3)GeoPandas
check_geographic_crspatternT. Vincenty, “Direct and Inverse Solutions of Geodesics on the Ellipsoid with Application of Nested Equations,” Survey Review 23(176), 1975.
C.F.F. Karney, “Algorithms for geodesics,” J. Geodesy 87(1), 2013.