Fusion Strategy¶
Use a lightweight staged operator DAG for fusion, not a whole-program lazy graph.
Intent¶
Define how chained kernels should eliminate intermediate buffers without committing the repo to an overbuilt execution engine too early.
Request Signals¶
fusion
intermediate elimination
kernel chain
operator dag
materialization boundary
reusable index
Open First¶
docs/architecture/fusion.md
docs/architecture/residency.md
docs/architecture/adaptive-runtime.md
src/vibespatial/runtime/fusion.py
docs/decisions/0009-fusion-strategy.md
Verify¶
uv run pytest tests/test_fusion_policy.pyuv run python scripts/check_docs.py --check
Risks¶
A full lazy graph would add scheduling complexity before the repo has enough kernels to justify it.
Only relying on hand-written fused kernels would leave too much performance on the table and duplicate logic.
Fusing across the wrong boundaries can hide reusable structures or break residency and diagnostics guarantees.
Candidate Approaches¶
full lazy evaluation graph
explicit fused kernel variants only
lightweight staged operator DAG
Evaluation¶
Full lazy graph:
strongest long-term optimizer story
highest implementation and debugging cost
wrong first move for the current kernel inventory
Explicit fused kernels only:
easiest to reason about locally
too rigid for mixed workloads and adaptive-runtime integration
forces every valuable chain to be rewritten by hand
Lightweight staged DAG:
enough structure to batch launches and eliminate ephemeral intermediates
keeps explicit boundaries for materialization, reusable indexes, and diagnostics
fits the current probe-first runtime better than a continuous optimizer
Decision¶
Use a lightweight staged operator DAG.
fuse device-local ephemeral chains
persist reusable structures such as spatial indexes
stop at explicit host materialization boundaries
keep fusion transparent to users
allow later kernels to provide specialized fused variants within the same staged contract
Default Fusible Chains¶
bounds -> SFC key -> sort
predicate -> filter -> compact
clip fast path -> predicate mask -> emit geometry slice
These chains may stay in one fused stage when they do not emit reusable structures and do not cross a host boundary.
Persisted Intermediates¶
Do not fuse away:
spatial indexes
reusable partition metadata
explicit materialized host outputs
buffers that are referenced by more than one downstream branch
Runtime Interaction¶
fusion planning happens before execution or at chunk boundaries
adaptive runtime may choose a fused stage shape, but not rewire the graph mid-kernel
residency and fallback diagnostics must stay visible at stage boundaries