Make Valid Pipeline¶
Intent¶
Define the repo-owned make_valid pipeline so topology repair work only runs on
invalid rows and can later map onto GPU compaction plus constructive repair
stages.
Request Signals¶
make_valid
validity
topology repair
compaction
invalid rows
shapely.make_valid
geometryarray make_valid
Open First¶
docs/architecture/make-valid.md
src/vibespatial/constructive/make_valid_pipeline.py
src/vibespatial/api/_shapely_dispatch.py
tests/test_make_valid_pipeline.py
Verify¶
uv run pytest tests/test_make_valid_pipeline.pyuv run pytest tests/test_device_geometry_array.py -k shapely_make_valid_dispatches_device_geometry_array_directlyuv run python scripts/check_docs.py --check
Risks¶
Running repair on all rows instead of compacted invalids wastes compute on already-valid geometries.
Validity checking and repair becoming coupled prevents staging them as separate GPU stages.
Undocumented third-party adapter hooks make import-time behavior hard to discover.
Decision¶
Compute validity first.
Compact invalid rows into a dense repair batch.
Leave valid rows untouched.
Repair only the compacted invalid subset.
Scatter repaired rows back into original order.
When constructing a replacement
GeoSeriesfrom repaired geometry, always passindex=df.index(orindex=gs.index) to preserve non-contiguous index alignment from upstream operations likeclip()orilocslicing. Omitting the index creates a defaultRangeIndex(0..N-1)which silently drops rows during pandas column assignment when the DataFrame index is non-contiguous.When all rows pass validation and an
OwnedGeometryArraywas provided,MakeValidResult.ownedcarries the original device-resident array so downstream stages (e.g., dissolve) can stay on device without re-uploading (ADR-0005 zero-transfer chain).
Dispatch¶
make_valid_owned()owns runtime dispatch viaplan_dispatch_selection()and records dispatch events internally; the API layer (GeometryArray,DeviceGeometryArray) does not record its own events.Two kernel variants are registered (ADR-0033):
make_valid/gpu-nvrtc(polygon/multipolygon GPU repair) andmake_valid/cpu(all families, Shapely fallback).The
dispatch_modeparameter controls GPU/CPU/AUTO selection.vibeSpatial installs a process-wide
shapely.make_validadapter at import time viasrc/vibespatial/api/_shapely_dispatch.py. For repo-ownedGeometryArrayandDeviceGeometryArrayinputs, the wrapper dispatches togeometry.make_valid(...)so device-backed public workflows such asgdf.set_geometry(shapely.make_valid(gdf.geometry.values))stay on the native path. All other input types continue to use Shapely’s original implementation.This hook is limited to
make_valid. New Shapely monkeypatches should not be added casually; if a public adapter hook is necessary, document it here or in a dedicated ADR before landing.
Performance Notes¶
Validity checking is much cheaper than topology repair, so compacting invalid rows is the right default for valid-heavy datasets.
This staging is directly compatible with CCCL-style
DeviceSelectand scatter primitives.The current host implementation already benefits from skipping repair work on valid rows.