vibeproj.fused_kernels¶
Fused NVRTC kernels for GPU-accelerated coordinate projection.
Each kernel runs the full transform pipeline (axis swap, deg/rad, central meridian, projection math, scale, offset) in a single kernel launch — one thread per coordinate. This eliminates ~20 intermediate kernel launches and array round-trips compared to the CuPy element-wise path.
Uses CuPy RawKernel for NVRTC compilation and caching.
Functions¶
|
Check if a fused kernel is available for this projection + direction. |
|
Pre-compile fused NVRTC kernels to eliminate first-call latency. |
|
Execute a fused GPU kernel for the full transform pipeline. |
Pre-compile the Helmert datum shift kernel. |
|
|
Execute the Helmert datum shift on GPU. |
Module Contents¶
- vibeproj.fused_kernels.can_fuse(projection_name: str, direction: str) bool¶
Check if a fused kernel is available for this projection + direction.
- vibeproj.fused_kernels.compile_kernels(projections=None, *, precision='auto')¶
Pre-compile fused NVRTC kernels to eliminate first-call latency.
Parameters¶
- projectionslist of str, optional
Projection names to compile (e.g. [“tmerc”, “webmerc”]). If None, compiles all supported projections.
- precisionstr
Compute precision: “auto”/”fp64”/”fp32”/”ds”.
- vibeproj.fused_kernels.fused_transform(arg1, arg2, *, projection_name: str, direction: str, computed: dict, src_north_first: bool, dst_north_first: bool, xp, out_x=None, out_y=None, precision: str = 'auto', stream=None) tuple | None¶
Execute a fused GPU kernel for the full transform pipeline.
Parameters¶
- out_x, out_ycupy.ndarray, optional
Pre-allocated output arrays. Pass these to avoid allocation.
- precisionstr
“fp64” = full double precision (default for fp64 input). “fp32” = fp32 compute with fp64 I/O (ADR-0002 mixed precision). “auto” = fp64 (projection math is trig-dominated / SFU-bound).
Mixed precision (fp32 compute, fp64 I/O) is ADR-0002 compliant: - Input/output arrays are always fp64 (canonical storage precision) - Projection math runs in fp32 for ~32x throughput on consumer GPUs - Final scale/offset always in fp64 for sub-meter output precision
- vibeproj.fused_kernels.compile_helmert_kernel()¶
Pre-compile the Helmert datum shift kernel.
- vibeproj.fused_kernels.fused_helmert_shift(lat, lon, helmert_params, xp, h=None, out_lat=None, out_lon=None, out_h=None, stream=None)¶
Execute the Helmert datum shift on GPU.
Parameters¶
- lat, loncupy.ndarray
Geographic coordinates in degrees on the source ellipsoid.
- helmert_paramsHelmertParams
Transformation parameters.
- xpmodule
Array module (must be cupy).
- hcupy.ndarray, optional
Ellipsoidal height in meters. When provided, height is transformed.
- out_lat, out_loncupy.ndarray, optional
Pre-allocated output arrays.
- out_hcupy.ndarray, optional
Pre-allocated output height array (only used when h is not None).
- streamcupy.cuda.Stream, optional
CUDA stream for async execution.
Returns¶
(out_lat, out_lon) or (out_lat, out_lon, out_h) or None if not on GPU.