Runtime Model

vibeSpatial is GPU-first, not GPU-optional.

Intent

Define runtime-selection rules, fallback visibility requirements, and the first files to inspect when execution behavior changes.

Request Signals

  • runtime

  • gpu

  • cuda

  • fallback

  • execution mode

  • kernel

  • cccl

  • diagnostics

Open First

  • docs/architecture/runtime.md

  • src/vibespatial/runtime/_runtime.py

  • src/geopandas/init.py

  • src/vibespatial/api/init.py

Verify

  • uv run pytest

Risks

  • Silent CPU fallback hides unsupported GPU behavior.

  • Runtime-selection changes can desync the GeoPandas shim from the runtime layer.

  • Kernel-oriented changes can look correct locally while breaking the upstream contract.

Core Rules

  • Design APIs around bulk device execution and parallel kernels.

  • Prefer cuda-python for runtime control and kernel launch plumbing.

  • Prefer CCCL for reusable data-parallel building blocks.

  • Runtime availability means a real CUDA device is present, not just that the Python package imports successfully.

  • CPU execution exists to preserve correctness and debuggability, not to define the architecture.

  • Canonical geometry storage should stay fp64; compute precision may dispatch separately from storage precision.

  • Null and empty geometries are distinct states and must stay distinct through buffer layout and kernel outputs.

  • Predicate and constructive kernels must declare a robustness guarantee, not just a precision mode.

  • Deterministic reproducibility is opt-in; default mode stays performance-first.

  • auto dispatch must use per-kernel crossover thresholds, not one global size gate.

  • Public API workflows should make one CPU/GPU dispatch decision at the boundary, not re-plan execution family at each internal step.

  • auto crossover thresholds apply at promotion time while inputs are host-resident; once a workload is already device-resident, auto stays on GPU and only re-plans among GPU variants.

  • Generic runtime probing must not claim GPU execution for auto by itself; the actual switch to GPU happens only inside kernel-specific dispatch planning.

  • Adaptive planning may re-evaluate at chunk boundaries, but not mid-kernel.

  • Repo-owned GeoSeries and GeoDataFrame methods must carry explicit dispatch registrations.

  • Repo-owned kernel modules must register at least one kernel variant before they are allowed to land.

  • Phase 9 bounds execution is the first live cuda-python kernel and keeps family-specialized CPU and GPU variants side by side so dispatch can stay performance-driven instead of one-size-fits-all.

Fallback

  • auto mode may fall back to CPU when GPU execution is unavailable.

  • Explicit gpu mode must fail loudly if the required GPU path is unsupported.

  • Fallback events should be observable. Silent host execution is not acceptable.

  • New fallback surfaces should be paired with tests or diagnostics.

  • Non-user host-to-device and device-to-host transfers must remain visible.

  • Device-to-host transfers belong only in explicit materialization surfaces such as to_pandas, to_numpy, values, and __repr__.

Session Execution Mode Override

The session-wide execution mode follows the determinism.py pattern:

  • VIBESPATIAL_EXECUTION_MODE env var (auto, cpu, gpu).

  • set_execution_mode() programmatic override (takes priority over env var).

  • get_requested_mode() reads: explicit override > env var > auto default.

  • CPU mode causes early returns in IO (_try_gpu_read_file, WKB decode/encode), DeviceGeometryArray operations (to_crs, dwithin, _binary_predicate, clip_by_rect), binary predicates, and geoseries_from_owned.

  • Setting the mode invalidates the adaptive runtime snapshot cache.

  • All entry points call get_requested_mode() to determine dispatch; internal GPU-only helpers are safe because their callers gate on mode first.

Provenance Rewrite Override

The provenance rewrite system (ADR-0039) follows the same pattern:

  • VIBESPATIAL_PROVENANCE_REWRITES env var (default: enabled; 0/false/ no/off to disable).

  • set_provenance_rewrites(bool | None) programmatic override (takes priority over env var; None clears override back to default).

  • provenance_rewrites_enabled() reads: explicit override > env var > True.

  • Gated at five sites: attempt_provenance_rewrite() in provenance.py (covers R1 and all consumption-time binary predicate rules), the R5/R6 branches in geometry_array.py:buffer(), the R7 branch in geometry_array.py:simplify(), and the R2 branch in sjoin.py:_geom_predicate_query().

Device-Native Result Boundary (ADR-0042)

GPU-selected workflows should remain device-native until an explicit compatibility or materialization surface is requested.

  • Low-level spatial query kernels may still return typed integer index arrays. SpatialJoinIndices and related dtype assertions remain useful for that narrow contract.

  • The architectural target for overlay, clip, dissolve, and other constructive/relational workflows is broader: device-resident geometry, provenance, and relation data should stay off host until an explicit export boundary such as to_geopandas(), to_pandas(), or to_shapely().

  • sjoin._frame_join and similar pandas assembly seams remain transitional compatibility layers, not the desired steady-state execution model.

  • Overlay’s current attribute assembly and keep-geometry-type handling remain migration surfaces. New work should move semantics handling toward typed device-side classification instead of host inspection.

  • I/O paths should keep Arrow or other columnar tables alive as long as possible and defer host conversion to explicit construction/materialization points.

  • Once auto has selected GPU for a workflow, internal steps must not silently pivot back to host execution just because a host-shaped helper exists.

Memory Pool Tiers (ADR-0040)

Device memory allocation uses a tiered strategy built on RAPIDS RMM when available, with CuPy’s built-in MemoryPool as the fallback.

Tier

Env Var

Allocator Stack

Default?

A

(none)

PoolMemoryResourceCudaMemoryResource

Yes (when RMM installed)

B

VIBESPATIAL_GPU_OOM_SAFETY=1

FailureCallbackResourceAdaptor → Pool → Cuda

No

C

VIBESPATIAL_GPU_MANAGED_MEMORY=1

ManagedMemoryResource (bare)

No

Fallback

(RMM not installed)

CuPy MemoryPool

Yes (without RMM)

  • Tier A provides a coalescing pool with ~5-15% peak VRAM reduction over CuPy’s power-of-2 binning, at zero overhead.

  • Tier B adds a GC-retry callback on OOM (bounded to 3 retries per event). Zero overhead on the happy path.

  • Tier C uses CUDA managed memory for datasets exceeding VRAM. Performance degrades 2-10× under oversubscription due to PCIe page migration; the SoA coordinate layout amplifies page faults.

  • Deferred initialization: RMM resources require a CUDA context, so _configure_rmm_pool() runs inside _ensure_context() after the primary context is retained. If RMM setup fails, the runtime falls back to the CuPy pool with a warning.

  • VIBESPATIAL_GPU_POOL_LIMIT maps to maximum_pool_size (Tiers A/B) and is ignored for Tier C (managed memory uses OS overcommit semantics).

  • _memory_backend discriminator values: "cupy", "rmm-pool", "rmm-safe", "rmm-managed", "none" (before context init).

Compatibility

  • GeoPandas behavior is measured with vendored upstream tests.

  • Upstream parity matters more than mirroring GeoPandas internals.

  • Rebuild abstractions only when the test contract or performance data demands them.