What's New
Prefix cache correctness and reuse
- Add strict boundary snapshot restore handling for non-sliceable cache layers.
- Fix exact-hit kickoff behavior to avoid
NvsN-1cache-state mismatch on first decode step. - Normalize rotating snapshot state for merge-safe restore behavior.
Walk-back truncation
- Add walk-back truncation for partial prefix matches to recover the latest valid non-sliceable state block.
- Extend walk-back support to both
ArraysCacheandRotatingKVCache. - Fix dropped-block
ref_counthandling during partial reconstruction.
Prefill performance
- Optimize boundary snapshot prefill chunking so cache-enabled cold prefill avoids excessive boundary splits while preserving boundary-safe captures.
Tests
- Expand scheduler, prefix cache, and hybrid cache tests for boundary snapshot, exact-hit kickoff, and walk-back scenarios.