vibeproj.fused_kernels

Fused NVRTC kernels for GPU-accelerated coordinate projection.

Each kernel runs the full transform pipeline (axis swap, deg/rad, central meridian, projection math, scale, offset) in a single kernel launch — one thread per coordinate. This eliminates ~20 intermediate kernel launches and array round-trips compared to the CuPy element-wise path.

Uses CuPy RawKernel for NVRTC compilation and caching.

Functions

can_fuse(→ bool)

Check if a fused kernel is available for this projection + direction.

compile_kernels([projections, precision])

Pre-compile fused NVRTC kernels to eliminate first-call latency.

fused_transform(→ tuple | None)

Execute a fused GPU kernel for the full transform pipeline.

compile_helmert_kernel()

Pre-compile the Helmert datum shift kernel.

fused_helmert_shift(lat, lon, helmert_params, xp[, h, ...])

Execute the Helmert datum shift on GPU.

Module Contents

vibeproj.fused_kernels.can_fuse(projection_name: str, direction: str) bool

Check if a fused kernel is available for this projection + direction.

vibeproj.fused_kernels.compile_kernels(projections=None, *, precision='auto')

Pre-compile fused NVRTC kernels to eliminate first-call latency.

Parameters

projectionslist of str, optional

Projection names to compile (e.g. [“tmerc”, “webmerc”]). If None, compiles all supported projections.

precisionstr

Compute precision: “auto”/”fp64”/”fp32”/”ds”.

vibeproj.fused_kernels.fused_transform(arg1, arg2, *, projection_name: str, direction: str, computed: dict, src_north_first: bool, dst_north_first: bool, xp, out_x=None, out_y=None, precision: str = 'auto', stream=None) tuple | None

Execute a fused GPU kernel for the full transform pipeline.

Parameters

out_x, out_ycupy.ndarray, optional

Pre-allocated output arrays. Pass these to avoid allocation.

precisionstr

“fp64” = full double precision (default for fp64 input). “fp32” = fp32 compute with fp64 I/O (ADR-0002 mixed precision). “auto” = fp64 (projection math is trig-dominated / SFU-bound).

Mixed precision (fp32 compute, fp64 I/O) is ADR-0002 compliant: - Input/output arrays are always fp64 (canonical storage precision) - Projection math runs in fp32 for ~32x throughput on consumer GPUs - Final scale/offset always in fp64 for sub-meter output precision

vibeproj.fused_kernels.compile_helmert_kernel()

Pre-compile the Helmert datum shift kernel.

vibeproj.fused_kernels.fused_helmert_shift(lat, lon, helmert_params, xp, h=None, out_lat=None, out_lon=None, out_h=None, stream=None)

Execute the Helmert datum shift on GPU.

Parameters

lat, loncupy.ndarray

Geographic coordinates in degrees on the source ellipsoid.

helmert_paramsHelmertParams

Transformation parameters.

xpmodule

Array module (must be cupy).

hcupy.ndarray, optional

Ellipsoidal height in meters. When provided, height is transformed.

out_lat, out_loncupy.ndarray, optional

Pre-allocated output arrays.

out_hcupy.ndarray, optional

Pre-allocated output height array (only used when h is not None).

streamcupy.cuda.Stream, optional

CUDA stream for async execution.

Returns

(out_lat, out_lon) or (out_lat, out_lon, out_h) or None if not on GPU.