CUDA Kernels¶
Kernel inventory¶
vibeSpatial’s GPU kernels live in src/vibespatial/kernels/. Each kernel
is an NVRTC source compiled at runtime.
Kernel |
Module |
Operations |
|---|---|---|
Bounds |
|
Geometry bounds, total bounds, Morton keys |
Predicates |
|
Point-in-polygon, point-within-bounds |
Segment intersection |
|
Extraction (count-scatter), candidate generation (sort-sweep + scatter), classification (Shewchuk adaptive) |
Overlay |
|
Half-edge graph, split events, face extraction |
Stroke |
|
Point buffer, offset curve |
Make valid |
|
GPU polygon repair |
Binary constructive |
|
Intersection, union, difference, symmetric difference (PIP + overlay dispatch) |
Envelope |
|
Bounding-box polygon (5-vertex closed ring) |
Exterior ring |
|
Polygon exterior ring extraction (LineString output) |
Normalize |
|
Ring rotation (lex-min vertex), linestring reversal, multi-part sorting |
Simplify |
|
Visvalingam-Whyatt vertex elimination |
Validity / Simplicity |
|
is_valid (OGC: ring closure, min coords, self-intersection, hole containment, ring crossing/overlap, multi-touch), is_simple (self-intersection) |
Clip by rect |
|
Bounds-filtered rectangle clip (point, line, polygon families via Sutherland-Hodgman / Liang-Barsky GPU kernels) |
Distance metrics |
|
Hausdorff (min-of-max brute force), discrete Frechet (DP coupling matrix) |
Polygon intersection |
|
Element-wise Sutherland-Hodgman polygon clipping (count-scatter, device-resident output) |
Polygon difference |
|
Element-wise polygon difference via overlay topology pipeline (face selection) |
Segmented union |
|
Per-group polygon union via binary-tree reduction of overlay_union_owned |
Adding a new kernel¶
Use the scaffold script:
uv run python scripts/generate_kernel_scaffold.py my_kernel_name
This creates:
src/vibespatial/kernels/my_kernel_name.py– kernel source + compilationtests/test_my_kernel_name.py– Shapely oracle test fixtureManifest entry for precompilation
Precision compliance¶
Every kernel must support dual-precision dispatch per ADR-0002. The
PrecisionPlan selects fp32 or fp64 based on device capability. Kernels
receive a precision_mode parameter and must use the appropriate types.
Precompilation¶
NVRTC kernels can be precompiled to reduce first-launch latency:
VIBESPATIAL_PRECOMPILE=1 uv run python -c "import vibespatial"