vibeproj.fused_kernels¶

Fused NVRTC kernels for GPU-accelerated coordinate projection.

Each kernel runs the full transform pipeline (axis swap, deg/rad, central meridian, projection math, scale, offset) in a single kernel launch — one thread per coordinate. This eliminates ~20 intermediate kernel launches and array round-trips compared to the CuPy element-wise path.

Uses CuPy RawKernel for NVRTC compilation and caching.

Functions¶

`can_fuse`(→ bool)	Check if a fused kernel is available for this projection + direction.
`compile_kernels`([projections, precision])	Pre-compile fused NVRTC kernels to eliminate first-call latency.
`fused_transform`(→ tuple \| None)	Execute a fused GPU kernel for the full transform pipeline.
`compile_helmert_kernel`()	Pre-compile the Helmert datum shift kernel.
`fused_helmert_shift`(lat, lon, helmert_params, xp[, h, ...])	Execute the Helmert datum shift on GPU.
`compile_svd_kernel`()	Pre-compile the SVD datum correction kernel.
`fused_svd_correction`(lat, lon, correction_data, xp, *)	Execute SVD datum correction on GPU.

Module Contents¶

vibeproj.fused_kernels.can_fuse(projection_name: str, direction: str) → bool¶: Check if a fused kernel is available for this projection + direction.

vibeproj.fused_kernels.compile_kernels(projections=None, *, precision='auto')¶

Pre-compile fused NVRTC kernels to eliminate first-call latency.

Parameters¶

projectionslist of str, optional: Projection names to compile (e.g. [“tmerc”, “webmerc”]). If None, compiles all supported projections.
precisionstr: Compute precision: “auto”/”fp64”/”fp32”/”ds”.

vibeproj.fused_kernels.fused_transform(arg1, arg2, *, projection_name: str, direction: str, computed: dict, src_north_first: bool, dst_north_first: bool, xp, out_x=None, out_y=None, precision: str = 'auto', stream=None) → tuple | None¶

Execute a fused GPU kernel for the full transform pipeline.

Parameters¶

out_x, out_ycupy.ndarray, optional: Pre-allocated output arrays. Pass these to avoid allocation.
precisionstr: “fp64” = full double precision (default for fp64 input). “fp32” = fp32 compute with fp64 I/O (ADR-0002 mixed precision). “auto” = fp64 (projection math is trig-dominated / SFU-bound).

Mixed precision (fp32 compute, fp64 I/O) is ADR-0002 compliant: - Input/output arrays are always fp64 (canonical storage precision) - Projection math runs in fp32 for ~32x throughput on consumer GPUs - Final scale/offset always in fp64 for sub-meter output precision

vibeproj.fused_kernels.compile_helmert_kernel()¶: Pre-compile the Helmert datum shift kernel.

vibeproj.fused_kernels.fused_helmert_shift(lat, lon, helmert_params, xp, h=None, out_lat=None, out_lon=None, out_h=None, stream=None)¶

Execute the Helmert datum shift on GPU.

Parameters¶

lat, loncupy.ndarray: Geographic coordinates in degrees on the source ellipsoid.
helmert_paramsHelmertParams: Transformation parameters.
xpmodule: Array module (must be cupy).
hcupy.ndarray, optional: Ellipsoidal height in meters. When provided, height is transformed.
out_lat, out_loncupy.ndarray, optional: Pre-allocated output arrays.
out_hcupy.ndarray, optional: Pre-allocated output height array (only used when h is not None).
streamcupy.cuda.Stream, optional: CUDA stream for async execution.

Returns¶

(out_lat, out_lon) or (out_lat, out_lon, out_h) or None if not on GPU.

vibeproj.fused_kernels.compile_svd_kernel()¶

Pre-compile the SVD datum correction kernel.

Call during warm-up to avoid first-use compilation latency.

vibeproj.fused_kernels.fused_svd_correction(lat, lon, correction_data, xp, *, negate=False, out_lat=None, out_lon=None, stream=None)¶

Execute SVD datum correction on GPU.

Parameters¶

lat, loncupy.ndarray: Geographic coordinates in degrees.
correction_dataDatumCorrectionData: SVD-compressed datum correction coefficients.
xpmodule: Array module (must be cupy).
negatebool: If True, subtract the correction (inverse direction).
out_lat, out_loncupy.ndarray, optional: Pre-allocated output arrays (zero-copy support).
streamcupy.cuda.Stream, optional: CUDA stream for async execution.

Returns¶

(out_lat, out_lon) or None if not on GPU / compilation fails.