What's Changed
- Refactor remote plugin to accept multiply connector by @maobaolong in #2666
- [MP]feat: support different kv cache shape and dtype across layers by @liuyumoye in #2926
- [Chore][CI]: K3 base CI image 12.9 CUDA by @sammshen in #2975
- fix: use pin=False in _allocate_and_put to prevent pd_buffer leak by @ningziwen in #2847
- feat(disk): support multi-path local disk backend for multi-device I/O by @glimchb in #2801
- [Chore][CI] Upgrade CI base image to CUDA 13.0 by @sammshen in #2981
- [doc] document long-doc-permutator workload in cli bench by @deng451e in #2963
- [MP][Bugfix] Fix deadlock caused by cuda launch host func by @ApostaC in #2952
- [BugFix]: Fix typo bug by @princepride in #2980
- [CI] Pin cu128 nightly wheel for blend ci test by @deng451e in #2987
- [MP][optimize] optimize save when mla enabled by @chunxiaozheng in #2935
- [hotfix] fix prometheus version for UT failure by @ApostaC in #3000
- Update LMCache Office Hours to Wednesday by @nijaba in #2990
- [fix] Limit proxy in-flight requests to prevent PD buffer deadlock by @deng451e in #2957
- [MP] Lazy start heartbeat thread when first req coming by @maobaolong in #2943
- [Operator] Add L2 RESP (Redis/Valkey) adapter support by @royyhuang in #2967
- [Feat][RawBlock] Add TP>1 support and compact batched retrieval path by @DongDongJu in #2948
- [MP] Introduce a simple way to register_gauge metrics. by @maobaolong in #2906
- [Build] Add lmcache-cli lightweight wheel by @deng451e in #2959
- Copy a snapshot of lmcache_mp_connector.py for vllm 0.18.0 by @maobaolong in #2887
- [MP] Add a new argument to specify whether retain_in_l1 by @maobaolong in #2813
- [Chore][CI] Skip k3 builds when only docs/trivial files changed by @sammshen in #2993
- [ops][refactor] Add full list of Python fallbacks to run without compiled CUDA extensions by @hlin99 in #2591
- [Feat] L0 Subscriber by @Oasis-Git in #2974
- refactor: extract PathSharder module for shared multi-path selection by @glimchb in #2982
- refactor(mp): replace job_id with request_id in query_prefetch_status by @yoo-kumaneko in #2996
- [MP] Support lazy import built-in l2 adapter by @maobaolong in #2905
- [MP][Optimize] Skip locked keys during LRU eviction to improve eviction efficiency by @chunxiaozheng in #2978
- fix: add controller config validation and clear error messages (#2907) by @ianliuy in #3003
- feat: add chunk hashes logger to MP server for offline data analysis by @yoo-kumaneko in #2928
- [Chore][CI]: K3 MP output token quantity tolerance by @sammshen in #3030
- feat(tools): add LRU cache simulator for lookup-hash JSONL logs by @yoo-kumaneko in #3021
- [Feat] L1 Subscriber by @Oasis-Git in #2986
- [Feat] Add cache_salt parameter to MP adapter interfaces by @royyhuang in #3029
- [Feat] Add is_user_level property and cache_salt param to EvictionPolicy by @royyhuang in #3032
- [Feat][DAX] Optimize staged batched restore path and document modification by @DongDongJu in #2904
- [Chore] Remove v0 code by @sammshen in #2968
- [Chore] add coding standard and PR review instructions by @ApostaC in #3039
- [Observability] Per-request root OTel span and SpanRegistry for MP server tracing by @deng451e in #3033
- feat(pd_backend): add pd_skip_proxy_notification to skip ZMQ proxy notification by @ningziwen in #2874
- [Bugfix] fix some memory leak in cache_engine and eic connector by @liubj77 in #2544
- [Hotfix][CI] Unblock CI: pandas auto-heal + CUDA 12 build toolchain by @sammshen in #3055
- [Hotfix][CI] Pin vLLM nightly to cu130 index to match CUDA 13 base image by @ApostaC in #3061
- [Docs] Mirror lmcache/ layout in docs/design/ for discoverability by @ApostaC in #3040
- Add scheduler instance_id and model_name to L0 KV lifecycle tracking by @Oasis-Git in #3043
- chore: expose package version via init.py by @hlin99 in #3034
- Fix: Safely handle layerwise cache shape dimensions in remote backend by @hlin99 in #2751
- [Core] Add persistence interfaces and nixl persistence by @YaoJiayi in #2938
- [Misc] Reduce the logs generated by lazy memory allocator by @ApostaC in #3068
- [MP][Feat] Add cache_salt to ObjectKey for cache isolation by @royyhuang in #3042
- [ROCm] Make bare-host ROCm install self-sufficient by @Shaoting-Feng in #3070
- [MP] Add tracing functionality for storage manager by @ApostaC in #3063
- [MP][optimize] unified touch all keys in end session request by @chunxiaozheng in #3020
- [step3] remove unnecessary code in mp adapter by @chunxiaozheng in #2994
- fix(mp): correct store cached requests in lmcache_mp_connector by @maobaolong in #3012
- [refactor]: Replace use_cufile with use_gds/gds_backend config flags by @glimchb in #2858
- [CI] Add cu13.0 wheel + container builds and nightly wheel releases by @deng451e in #3069
- [CI] Run the same test set on AMD as on NVIDIA by @Shaoting-Feng in #3071
- [ROCm][MP] Fix HIP invalid-argument on lazy host buffer past 2 GB by @Shaoting-Feng in #3079
- [CLI] Refactor query command by @deng451e in #2995
- [CI] add missing egress endpoints to nightly Docker build by @deng451e in #3087
- [Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install by @sammshen in #3093
- [CLI][fix] lazy torch import in init.py to unblock CLI-only installs by @deng451e in #3086
- [CLI] Introduce lmcache trace CLI by @ApostaC in #3075
- [Chore][Docs]: daily drift check — multi-process mode by @ApostaC in #3076
- [Fix][CI] fix nightly wheel versioning and build reliability by @deng451e in #3097
- [Hotfix][CI] Replace vllm main.py patch with sitecustomize.py by @sammshen in #3100
- [CI] fix blend-server venv by @deng451e in #3099
- [MP] Introduce MP runtime plugin framework by @maobaolong in #2956
New Contributors
Full Changelog: v0.4.3...v0.4.4