Staged Fusion Strategy¶

Context¶

The repo now has owned geometry arrays, coarse kernels, residency diagnostics, and a probe-first runtime planner. The remaining question is how to eliminate intermediate buffers in multi-step pipelines without overcommitting to a full graph runtime too early.

Decision¶

Use a lightweight staged operator DAG as the default fusion mechanism.

fuse ephemeral device-local chains
persist reusable structures such as indexes and partition metadata
treat explicit host materialization as a hard boundary
keep user-facing APIs unchanged
allow specialized fused kernels as an optimization inside the staged model, not as the only strategy

Consequences¶

The runtime planner has a clear place to choose stage shapes at dispatch time.
Diagnostics remain visible because stage boundaries are explicit.
Future kernels can register fusible chains without building a whole-program graph engine first.

Alternatives Considered¶

full lazy evaluation graph
explicit fused kernels only
no shared fusion contract until much later

Acceptance Notes¶

The landed implementation is a policy module plus tests. It defines how to classify ephemeral versus persisted intermediates and where fusion must stop. Actual fused execution and memory accounting remain follow-up implementation work.