Fusion Strategy

Use a lightweight staged operator DAG for fusion, not a whole-program lazy graph.

Intent

Define how chained kernels should eliminate intermediate buffers without committing the repo to an overbuilt execution engine too early.

Request Signals

  • fusion

  • intermediate elimination

  • kernel chain

  • operator dag

  • materialization boundary

  • reusable index

Open First

  • docs/architecture/fusion.md

  • docs/architecture/residency.md

  • docs/architecture/adaptive-runtime.md

  • src/vibespatial/runtime/fusion.py

  • docs/decisions/0009-fusion-strategy.md

Verify

  • uv run pytest tests/test_fusion_policy.py

  • uv run python scripts/check_docs.py --check

Risks

  • A full lazy graph would add scheduling complexity before the repo has enough kernels to justify it.

  • Only relying on hand-written fused kernels would leave too much performance on the table and duplicate logic.

  • Fusing across the wrong boundaries can hide reusable structures or break residency and diagnostics guarantees.

Candidate Approaches

  • full lazy evaluation graph

  • explicit fused kernel variants only

  • lightweight staged operator DAG

Evaluation

Full lazy graph:

  • strongest long-term optimizer story

  • highest implementation and debugging cost

  • wrong first move for the current kernel inventory

Explicit fused kernels only:

  • easiest to reason about locally

  • too rigid for mixed workloads and adaptive-runtime integration

  • forces every valuable chain to be rewritten by hand

Lightweight staged DAG:

  • enough structure to batch launches and eliminate ephemeral intermediates

  • keeps explicit boundaries for materialization, reusable indexes, and diagnostics

  • fits the current probe-first runtime better than a continuous optimizer

Decision

Use a lightweight staged operator DAG.

  • fuse device-local ephemeral chains

  • persist reusable structures such as spatial indexes

  • stop at explicit host materialization boundaries

  • keep fusion transparent to users

  • allow later kernels to provide specialized fused variants within the same staged contract

Default Fusible Chains

  • bounds -> SFC key -> sort

  • predicate -> filter -> compact

  • clip fast path -> predicate mask -> emit geometry slice

These chains may stay in one fused stage when they do not emit reusable structures and do not cross a host boundary.

Persisted Intermediates

Do not fuse away:

  • spatial indexes

  • reusable partition metadata

  • explicit materialized host outputs

  • buffers that are referenced by more than one downstream branch

Runtime Interaction

  • fusion planning happens before execution or at chunk boundaries

  • adaptive runtime may choose a fused stage shape, but not rewire the graph mid-kernel

  • residency and fallback diagnostics must stay visible at stage boundaries