Make Valid Pipeline

Intent

Define the repo-owned make_valid pipeline so topology repair work only runs on invalid rows and can later map onto GPU compaction plus constructive repair stages.

Request Signals

  • make_valid

  • validity

  • topology repair

  • compaction

  • invalid rows

Open First

  • docs/architecture/make-valid.md

  • src/vibespatial/constructive/make_valid_pipeline.py

  • tests/test_make_valid_pipeline.py

Verify

  • uv run pytest tests/test_make_valid_pipeline.py

  • uv run python scripts/check_docs.py --check

Risks

  • Running repair on all rows instead of compacted invalids wastes compute on already-valid geometries.

  • Validity checking and repair becoming coupled prevents staging them as separate GPU stages.

Decision

  • Compute validity first.

  • Compact invalid rows into a dense repair batch.

  • Leave valid rows untouched.

  • Repair only the compacted invalid subset.

  • Scatter repaired rows back into original order.

  • When all rows pass validation and an OwnedGeometryArray was provided, MakeValidResult.owned carries the original device-resident array so downstream stages (e.g., dissolve) can stay on device without re-uploading (ADR-0005 zero-transfer chain).

Dispatch

  • make_valid_owned() owns runtime dispatch via plan_dispatch_selection() and records dispatch events internally; the API layer (GeometryArray, DeviceGeometryArray) does not record its own events.

  • Two kernel variants are registered (ADR-0033): make_valid/gpu-nvrtc (polygon/multipolygon GPU repair) and make_valid/cpu (all families, Shapely fallback).

  • The dispatch_mode parameter controls GPU/CPU/AUTO selection.

Performance Notes

  • Validity checking is much cheaper than topology repair, so compacting invalid rows is the right default for valid-heavy datasets.

  • This staging is directly compatible with CCCL-style DeviceSelect and scatter primitives.

  • The current host implementation already benefits from skipping repair work on valid rows.