Dissolve Pipeline¶
Intent¶
Define the repo-owned dissolve pipeline so grouped constructive work can later map onto GPU sorting and segmented-union primitives instead of Python group iteration.
Request Signals¶
dissolve
grouped union
segmented union
GeoDataFrame.dissolve
attribute aggregation
Open First¶
docs/architecture/dissolve.md
src/vibespatial/overlay/dissolve.py
tests/test_dissolve_pipeline.py
tests/test_gpu_dissolve.py
Verify¶
uv run pytest tests/test_dissolve_pipeline.py tests/test_gpu_dissolve.pyuv run python scripts/check_docs.py --check
Risks¶
Python group iteration dominates wall time if the group-span stage is not bulk.
Stable in-group row order is required for deterministic output.
Replacing the full public method instead of just the grouped union stage overcomplicates the GPU transition.
Decision¶
Encode dissolve groups once and preserve stable per-group row order.
Keep attribute aggregation and geometry union as separate stages.
Use grouped union as the canonical geometry stage for
GeoDataFrame.dissolve.Treat pandas aggregation semantics as the public contract boundary.
Favor CCCL-style building blocks: stable sort, run-length encode, reduce-by-key, compaction, and scatter/gather.
Pipeline¶
Encode group keys into dense integer codes.
Stable-sort rows by group code.
Run-length encode sorted codes into group spans.
Reduce non-geometry columns per group.
Union each group independently.
Reassemble grouped geometries with aggregated attributes.
Performance Notes¶
Sorting and group-span discovery are reusable across attribute and geometry work, which keeps the eventual GPU path coherent.
Grouped union should be per-group work dispatch, not one global union followed by regrouping.
Stable in-group row order matters for deterministic output and debugability.
Host performance is acceptable enough to route
GeoDataFrame.dissolvethrough the grouped pipeline today; future GPU work should replace only the grouped union stage, not the full public method.o17.9.6.5is allowed to route axis-aligned rectangle coverages into a dedicated grouped GPU union fast path when that workload can be reduced to per-group bounds aggregation without reopening generic union topology work.