Stroke Kernels

Intent

Define the repo-owned buffer and offset-curve kernel seam so stroke-style constructive work can later move to GPU-friendly prefix-sum and scatter pipelines.

Request Signals

  • buffer

  • offset curve

  • stroke kernel

  • constructive

  • point buffer

Open First

  • docs/architecture/stroke-kernels.md

  • src/vibespatial/kernels/

  • tests/test_stroke_kernels.py

Verify

  • uv run pytest tests/test_stroke_kernels.py

  • uv run python scripts/check_docs.py --check

Risks

  • Current host prototypes are 4-5x slower than Shapely; public dispatch must stay on Shapely fallback until GPU variants land.

  • Prefix-sum emission complexity grows with join and cap classification; land simple cases first.

Decision

  • Treat stroke construction as a bulk vertex-emission problem.

  • Expand distances once, derive segment frames in parallel, classify joins and caps, prefix-sum output counts, and scatter final vertices.

  • Land a real repo-owned point-buffer prototype now.

  • Land a deterministic LineString offset-curve prototype for simple mitre and bevel joins.

  • Keep the public GeoPandas surface on explicit Shapely fallback for now because current host benchmarks are still slower than direct Shapely execution.

Current Scope

  • buffer: owned prototype for positive-distance Point rows.

  • offset_curve: owned prototype for simple LineString rows with non-round joins.

  • GPU-dispatched buffer surfaces (point, linestring, polygon) return OwnedGeometryArray directly without Shapely materialization.

  • CPU/host fallback paths still defer to Shapely.

Performance Notes

  • Prefix-sum plus scatter is the right output strategy for future GPU stroke kernels because per-row vertex counts vary.

  • Point buffers are the first clean constructive case because they avoid segment topology and still exercise arc sampling.

  • Offset curves need bulk segment-frame generation and join classification; the current host prototype exists to validate shape and semantics, not to claim host-wide speed leadership yet.

  • Current host benchmarks at 1K rows are about 4-5x slower than Shapely for both prototypes; GPU variants have landed and bypass Shapely entirely, while the Shapely fallback applies only to CPU/host execution.