Dissolve Pipeline¶
Intent¶
Define the repo-owned dissolve pipeline so grouped constructive work can later map onto GPU sorting and segmented-union primitives instead of Python group iteration.
Request Signals¶
dissolve
grouped union
segmented union
GeoDataFrame.dissolve
attribute aggregation
Open First¶
docs/architecture/dissolve.md
src/vibespatial/overlay/dissolve.py
tests/test_dissolve_pipeline.py
tests/test_gpu_dissolve.py
Verify¶
uv run pytest tests/test_dissolve_pipeline.py tests/test_gpu_dissolve.pyuv run python scripts/check_docs.py --check
Risks¶
Python group iteration dominates wall time if the group-span stage is not bulk.
Stable in-group row order is required for deterministic output.
Replacing the full public method instead of just the grouped union stage overcomplicates the GPU transition.
Decision¶
Encode dissolve groups once and preserve stable per-group row order.
Keep attribute aggregation and geometry union as separate stages.
Use grouped union as the canonical geometry stage for
GeoDataFrame.dissolve.Build a native grouped constructive result first, then export to GeoPandas at the explicit public boundary.
Favor CCCL-style building blocks: stable sort, run-length encode, reduce-by-key, compaction, and scatter/gather.
Pipeline¶
Encode group keys into dense integer codes.
Stable-sort rows by group code.
Run-length encode sorted codes into group spans.
Reduce non-geometry columns per group.
Union each group independently.
Reassemble grouped geometries with aggregated attributes.
Performance Notes¶
Sorting and group-span discovery are reusable across attribute and geometry work, which keeps the eventual GPU path coherent.
Numeric and bool
sum/count/mean/min/max/first/lastreducers may consumeNativeGroupeddirectly for admitted single-key shapes, including categorical keys with explicit null-group handling.Host metadata columns may use
NativeGroupedfirst/lasttake-reducers with pandas-compatible skip-null semantics.as_index=Falseassembly should stay a native export-boundary concern: reset-index columns may be represented by deferredNativeAttributeTableloader metadata until public materialization or terminal IO requires pandas.Grouped union should be per-group work dispatch, not one global union followed by regrouping.
Many small polygon groups should still batch when enough groups need real reduction. The reusable shape is
OwnedGeometryArray + dense group offsets -> grouped constructive result; publicdissolveis only the first consumer.o18.xis allowed to route polygon coverage dissolve groups into a shared-edge elimination fast path: cancel duplicate undirected edges inside each group, reconstruct grouped boundary linework in bulk, and build the final coverage areas without reopening generic overlay topology.o18.xis also allowed to expose a lazy dissolve surface for predicate-heavy workflows: keep grouped members and per-group bounds, answer exact scalarintersectswithout materializing the dissolved geometry, answer exact pointcontainsthe same way, and only materialize the true grouped union when a geometry-producing surface is actually requested.Stable in-group row order matters for deterministic output and debugability.
Host performance is acceptable enough to route
GeoDataFrame.dissolvethrough the grouped pipeline today; future GPU work should replace only the grouped union stage, not the full public method.o17.9.6.5is allowed to route axis-aligned rectangle coverages into a dedicated grouped GPU union fast path when that workload can be reduced to per-group bounds aggregation without reopening generic union topology work.