Make Valid Pipeline

Intent

Define the repo-owned make_valid pipeline so topology repair work only runs on invalid rows and can later map onto GPU compaction plus constructive repair stages.

Request Signals

  • make_valid

  • validity

  • topology repair

  • compaction

  • invalid rows

  • shapely.make_valid

  • geometryarray make_valid

Open First

  • docs/architecture/make-valid.md

  • src/vibespatial/constructive/make_valid_pipeline.py

  • src/vibespatial/api/_shapely_dispatch.py

  • tests/test_make_valid_pipeline.py

Verify

  • uv run pytest tests/test_make_valid_pipeline.py

  • uv run pytest tests/test_device_geometry_array.py -k shapely_make_valid_dispatches_device_geometry_array_directly

  • uv run python scripts/check_docs.py --check

Risks

  • Running repair on all rows instead of compacted invalids wastes compute on already-valid geometries.

  • Validity checking and repair becoming coupled prevents staging them as separate GPU stages.

  • Undocumented third-party adapter hooks make import-time behavior hard to discover.

Decision

  • Compute validity first.

  • Compact invalid rows into a dense repair batch.

  • Leave valid rows untouched.

  • Repair only the compacted invalid subset.

  • Scatter repaired rows back into original order.

  • When constructing a replacement GeoSeries from repaired geometry, always pass index=df.index (or index=gs.index) to preserve non-contiguous index alignment from upstream operations like clip() or iloc slicing. Omitting the index creates a default RangeIndex(0..N-1) which silently drops rows during pandas column assignment when the DataFrame index is non-contiguous.

  • When all rows pass validation and an OwnedGeometryArray was provided, MakeValidResult.owned carries the original device-resident array so downstream stages (e.g., dissolve) can stay on device without re-uploading (ADR-0005 zero-transfer chain).

Dispatch

  • make_valid_owned() owns runtime dispatch via plan_dispatch_selection() and records dispatch events internally; the API layer (GeometryArray, DeviceGeometryArray) does not record its own events.

  • Two kernel variants are registered (ADR-0033): make_valid/gpu-nvrtc (polygon/multipolygon GPU repair) and make_valid/cpu (all families, Shapely fallback).

  • The dispatch_mode parameter controls GPU/CPU/AUTO selection.

  • vibeSpatial installs a process-wide shapely.make_valid adapter at import time via src/vibespatial/api/_shapely_dispatch.py. For repo-owned GeometryArray and DeviceGeometryArray inputs, the wrapper dispatches to geometry.make_valid(...) so device-backed public workflows such as gdf.set_geometry(shapely.make_valid(gdf.geometry.values)) stay on the native path. All other input types continue to use Shapely’s original implementation.

  • This hook is limited to make_valid. New Shapely monkeypatches should not be added casually; if a public adapter hook is necessary, document it here or in a dedicated ADR before landing.

Performance Notes

  • Validity checking is much cheaper than topology repair, so compacting invalid rows is the right default for valid-heavy datasets.

  • This staging is directly compatible with CCCL-style DeviceSelect and scatter primitives.

  • The current host implementation already benefits from skipping repair work on valid rows.