vibeproj.gpu_detect¶

GPU type detection for automatic precision selection.

Queries the NVIDIA driver to determine fp64:fp32 throughput ratio. Consumer GPUs (RTX series): 1:64 — use double-single fp32 arithmetic. Datacenter GPUs (A100, H100): 1:2 — use native fp64.

Functions¶

`get_fp64_ratio`(→ float)	Return the fp64:fp32 throughput ratio for the current GPU.
`favors_native_fp64`(→ bool)	True if fp64 is fast enough to use directly (datacenter GPU or CPU).
`select_compute_precision`(→ str)	Select compute precision based on GPU type.

Module Contents¶

vibeproj.gpu_detect.get_fp64_ratio() → float¶

Return the fp64:fp32 throughput ratio for the current GPU.

Returns 1.0 if no GPU is available (CPU mode — always use fp64). Returns ratio >= 0.25 for datacenter GPUs (use native fp64). Returns ratio < 0.25 for consumer GPUs (use compensated fp32).

vibeproj.gpu_detect.favors_native_fp64() → bool¶: True if fp64 is fast enough to use directly (datacenter GPU or CPU).

vibeproj.gpu_detect.select_compute_precision() → str¶

Select compute precision based on GPU type.

Always returns “fp64” — projection math is dominated by transcendental functions (sin, cos, atan2, asinh) which use the SFU. The fp64:fp32 ratio for SFU ops is ~4x (not 64x like ALU ops), so the theoretical 64x fp32 throughput advantage doesn’t materialize for projections.

On the RTX 4090 (1:64 ALU ratio), fp64 TM still runs at 0.49ms/1M points = 183x faster than CPU. The ds path exists for experimentation but provides no speedup for trig-heavy projection kernels.