Dual Precision Dispatch¶
Context¶
Consumer GPUs often have dramatically worse fp64 throughput than fp32, while GeoPandas and Shapely semantics expect double-precision-like accuracy for real world coordinates. The repo needs a precision strategy before kernel work expands because the choice affects buffer layout, kernel signatures, and later robustness contracts.
Decision¶
Store authoritative coordinates in fp64, then choose compute precision at dispatch time.
autochooses by device profile and kernel classfp32forces staged fp32 execution with centering and compensationfp64forces native fp64 execution
On consumer-style GPUs:
coarse and metric kernels may use staged fp32
predicate kernels may use staged fp32 only with selective fp64 refinement
constructive kernels stay on fp64 until later robustness work proves a safe alternative
On datacenter-style GPUs with favorable fp64 throughput:
default broadly to native fp64
Consequences¶
Owned buffers keep one authoritative coordinate representation.
Kernel APIs need a shared precision plan contract instead of ad hoc precision booleans.
Later kernel work can use consumer GPU throughput without forcing canonical fp32 storage.
Robustness work remains a separate decision for exact predicates and overlay.
Alternatives Considered¶
fp64 everywhere
fp32 everywhere
canonical dual fp32/fp64 buffer storage
a single global precision switch with no kernel-class distinctions
Acceptance Notes¶
The first landed policy exposes PrecisionPlan selection and documents staged
fp32 versus native fp64 defaults. Numerical corpus validation and exact
predicate policy remain follow-up work for later work.