Family-Specialized GeoArrow Codecs¶
Context¶
o17.6.18 made aligned GeoArrow adoption nearly free and o17.6.20 added a
real GeoParquet scan engine, but the family codec layer still paid too much
generic bridge overhead. The next performance lever is to encode and decode
native geometry columns by family so points, lines, polygons, and multiparts do
not all route through one mixed host reconstruction path.
Decision¶
Adopt family-specialized GeoArrow encode and decode paths as the default native codec layer behind owned geometry IO.
homogeneous families encode directly to GeoArrow extension arrays with family-local offsets and shared metadata
decode dispatches by GeoArrow extension type into family-specific owned-buffer builders
malformed or unsupported inputs remain isolated through explicit fallback to WKB or host bridge paths instead of polluting the native fast path
mixed-family GeoArrow export keeps using an explicit compatibility fallback until a partition-and-restore mixed codec lands
the public GeoPandas Arrow export surface should not record a fallback event when the repo-owned native family codec succeeds
Consequences¶
native point export is now a lightweight owned-buffer to Arrow assembly step instead of a generic host bridge
polygon and multipart decode stay on typed offset assembly paths that match the owned buffer schema
pylibcudfGeoParquet scans now have a cleaner seam for replacing the current Arrow bridge with device-local family kernels laterstrict-native coverage can credit native GeoArrow export success instead of treating it as a fallback
Alternatives Considered¶
keep one generic mixed GeoArrow codec for all families
export all unsupported or mixed cases through WKB only
wait for full device-side CCCL kernels before landing any family-specialized codec structure
Acceptance Notes¶
This landing establishes the typed family codec structure, native homogeneous
export fast paths, explicit mixed fallback, and benchmark surface. The current
implementation still uses pyarrow for final array/table assembly on the host;
the next step is replacing the remaining bridge cost with device-local kernels
behind the same family-specialized contract.