Precision and GPU Behaviour¶
I/O precision¶
All input/output arrays use fp64 (double) storage. This is a hard
convention (ADR-0002): coordinate data is always stored at full double
precision regardless of the compute precision used internally.
Compute precision modes¶
The transform_buffers() method accepts a precision parameter:
Mode |
Compute type |
Accuracy |
Use case |
|---|---|---|---|
|
|
Full |
Default. Exact to machine epsilon. |
|
|
~1m |
Expert opt-in. Raw fp32 projection math. |
|
double-single |
~fp64 |
Experimental. fp32 pair arithmetic. |
|
|
Full |
Same as fp64 (trig-dominated). |
# Full precision (default)
t.transform_buffers(lat, lon, precision="fp64")
# Experimental double-single arithmetic (TM only)
t.transform_buffers(lat, lon, precision="ds")
Why auto always uses fp64¶
Projection math is dominated by transcendental functions (sin, cos,
atan2, asinh) which use the GPU’s Special Function Unit (SFU). The
SFU fp64:fp32 throughput ratio is ~1:4, not 1:64 like ALU operations.
This means fp32 compute gives only ~4x speedup for projection kernels, not the theoretical 32x. Since the GPU is already 100–300x faster than CPU at fp64, the accuracy trade-off is not worthwhile. Auto mode therefore always selects fp64.
Double-single arithmetic¶
The "ds" precision mode uses pairs of fp32 values to represent ~48-bit
mantissa (~14 decimal digits). This is implemented for Transverse Mercator
and gives fp64-equivalent accuracy using fp32 FMA instructions.
On consumer GPUs (RTX series, 1:64 fp64:fp32 ratio):
ds_add: ~10x faster than fp64 addds_mul: ~16x faster than fp64 mul
In practice, the SFU bottleneck means ds provides no speedup for trig-heavy projection kernels. The ds path exists for experimentation.
Consumer vs datacenter GPUs¶
vibeProj queries SingleToDoublePrecisionPerfRatio to classify the GPU:
Consumer (RTX 4090, etc.): ratio = 1:64 for ALU, but ~1:4 for SFU
Datacenter (A100, H100): ratio = 1:2
Both types run fp64 by default. Datacenter GPUs will see higher absolute throughput due to their better fp64 hardware.