Precision and GPU Behaviour¶

I/O precision¶

All input/output arrays use fp64 (double) storage. This is a hard convention (ADR-0002): coordinate data is always stored at full double precision regardless of the compute precision used internally.

Compute precision modes¶

The transform_buffers() method accepts a precision parameter:

Mode	Compute type	Accuracy	Use case
`"fp64"`	`double`	Full	Default. Exact to machine epsilon.
`"fp32"`	`float`	~1m	Expert opt-in. Raw fp32 projection math.
`"ds"`	double-single	~fp64	Experimental. fp32 pair arithmetic.
`"auto"`	`double`	Full	Same as fp64 (trig-dominated).

# Full precision (default)
t.transform_buffers(lon, lat, precision="fp64")

# Experimental double-single arithmetic (TM only)
t.transform_buffers(lon, lat, precision="ds")

Why auto always uses fp64¶

Projection math is dominated by transcendental functions (sin, cos, atan2, asinh) which use the GPU’s Special Function Unit (SFU). The SFU fp64:fp32 throughput ratio is ~1:4, not 1:64 like ALU operations.

This means fp32 compute gives only ~4x speedup for projection kernels, not the theoretical 32x. Since the GPU is already 100–300x faster than CPU at fp64, the accuracy trade-off is not worthwhile. Auto mode therefore always selects fp64.

Double-single arithmetic¶

The "ds" precision mode uses pairs of fp32 values to represent ~48-bit mantissa (~14 decimal digits). This is implemented for Transverse Mercator and gives fp64-equivalent accuracy using fp32 FMA instructions.

On consumer GPUs (RTX series, 1:64 fp64:fp32 ratio):

ds_add: ~10x faster than fp64 add
ds_mul: ~16x faster than fp64 mul

In practice, the SFU bottleneck means ds provides no speedup for trig-heavy projection kernels. The ds path exists for experimentation.

Consumer vs datacenter GPUs¶

vibeProj queries SingleToDoublePrecisionPerfRatio to classify the GPU:

Consumer (RTX 4090, etc.): ratio = 1:64 for ALU, but ~1:4 for SFU
Datacenter (A100, H100): ratio = 1:2

Both types run fp64 by default. Datacenter GPUs will see higher absolute throughput due to their better fp64 hardware.