LMCache/LMCache v0.4.4 on GitHub

What's Changed

Refactor remote plugin to accept multiply connector by @maobaolong in #2666
[MP]feat: support different kv cache shape and dtype across layers by @liuyumoye in #2926
[Chore][CI]: K3 base CI image 12.9 CUDA by @sammshen in #2975
fix: use pin=False in _allocate_and_put to prevent pd_buffer leak by @ningziwen in #2847
feat(disk): support multi-path local disk backend for multi-device I/O by @glimchb in #2801
[Chore][CI] Upgrade CI base image to CUDA 13.0 by @sammshen in #2981
[doc] document long-doc-permutator workload in cli bench by @deng451e in #2963
[MP][Bugfix] Fix deadlock caused by cuda launch host func by @ApostaC in #2952
[BugFix]: Fix typo bug by @princepride in #2980
[CI] Pin cu128 nightly wheel for blend ci test by @deng451e in #2987
[MP][optimize] optimize save when mla enabled by @chunxiaozheng in #2935
[hotfix] fix prometheus version for UT failure by @ApostaC in #3000
Update LMCache Office Hours to Wednesday by @nijaba in #2990
[fix] Limit proxy in-flight requests to prevent PD buffer deadlock by @deng451e in #2957
[MP] Lazy start heartbeat thread when first req coming by @maobaolong in #2943
[Operator] Add L2 RESP (Redis/Valkey) adapter support by @royyhuang in #2967
[Feat][RawBlock] Add TP>1 support and compact batched retrieval path by @DongDongJu in #2948
[MP] Introduce a simple way to register_gauge metrics. by @maobaolong in #2906
[Build] Add lmcache-cli lightweight wheel by @deng451e in #2959
Copy a snapshot of lmcache_mp_connector.py for vllm 0.18.0 by @maobaolong in #2887
[MP] Add a new argument to specify whether retain_in_l1 by @maobaolong in #2813
[Chore][CI] Skip k3 builds when only docs/trivial files changed by @sammshen in #2993
[ops][refactor] Add full list of Python fallbacks to run without compiled CUDA extensions by @hlin99 in #2591
[Feat] L0 Subscriber by @Oasis-Git in #2974
refactor: extract PathSharder module for shared multi-path selection by @glimchb in #2982
refactor(mp): replace job_id with request_id in query_prefetch_status by @yoo-kumaneko in #2996
[MP] Support lazy import built-in l2 adapter by @maobaolong in #2905
[MP][Optimize] Skip locked keys during LRU eviction to improve eviction efficiency by @chunxiaozheng in #2978
fix: add controller config validation and clear error messages (#2907) by @ianliuy in #3003
feat: add chunk hashes logger to MP server for offline data analysis by @yoo-kumaneko in #2928
[Chore][CI]: K3 MP output token quantity tolerance by @sammshen in #3030
feat(tools): add LRU cache simulator for lookup-hash JSONL logs by @yoo-kumaneko in #3021
[Feat] L1 Subscriber by @Oasis-Git in #2986
[Feat] Add cache_salt parameter to MP adapter interfaces by @royyhuang in #3029
[Feat] Add is_user_level property and cache_salt param to EvictionPolicy by @royyhuang in #3032
[Feat][DAX] Optimize staged batched restore path and document modification by @DongDongJu in #2904
[Chore] Remove v0 code by @sammshen in #2968
[Chore] add coding standard and PR review instructions by @ApostaC in #3039
[Observability] Per-request root OTel span and SpanRegistry for MP server tracing by @deng451e in #3033
feat(pd_backend): add pd_skip_proxy_notification to skip ZMQ proxy notification by @ningziwen in #2874
[Bugfix] fix some memory leak in cache_engine and eic connector by @liubj77 in #2544
[Hotfix][CI] Unblock CI: pandas auto-heal + CUDA 12 build toolchain by @sammshen in #3055
[Hotfix][CI] Pin vLLM nightly to cu130 index to match CUDA 13 base image by @ApostaC in #3061
[Docs] Mirror lmcache/ layout in docs/design/ for discoverability by @ApostaC in #3040
Add scheduler instance_id and model_name to L0 KV lifecycle tracking by @Oasis-Git in #3043
chore: expose package version via init.py by @hlin99 in #3034
Fix: Safely handle layerwise cache shape dimensions in remote backend by @hlin99 in #2751
[Core] Add persistence interfaces and nixl persistence by @YaoJiayi in #2938
[Misc] Reduce the logs generated by lazy memory allocator by @ApostaC in #3068
[MP][Feat] Add cache_salt to ObjectKey for cache isolation by @royyhuang in #3042
[ROCm] Make bare-host ROCm install self-sufficient by @Shaoting-Feng in #3070
[MP] Add tracing functionality for storage manager by @ApostaC in #3063
[MP][optimize] unified touch all keys in end session request by @chunxiaozheng in #3020
[step3] remove unnecessary code in mp adapter by @chunxiaozheng in #2994
fix(mp): correct store cached requests in lmcache_mp_connector by @maobaolong in #3012
[refactor]: Replace use_cufile with use_gds/gds_backend config flags by @glimchb in #2858
[CI] Add cu13.0 wheel + container builds and nightly wheel releases by @deng451e in #3069
[CI] Run the same test set on AMD as on NVIDIA by @Shaoting-Feng in #3071
[ROCm][MP] Fix HIP invalid-argument on lazy host buffer past 2 GB by @Shaoting-Feng in #3079
[CLI] Refactor query command by @deng451e in #2995
[CI] add missing egress endpoints to nightly Docker build by @deng451e in #3087
[Hotfix][CI] Fail-fast when vLLM CLI import chain is broken post-install by @sammshen in #3093
[CLI][fix] lazy torch import in init.py to unblock CLI-only installs by @deng451e in #3086
[CLI] Introduce lmcache trace CLI by @ApostaC in #3075
[Chore][Docs]: daily drift check — multi-process mode by @ApostaC in #3076
[Fix][CI] fix nightly wheel versioning and build reliability by @deng451e in #3097
[Hotfix][CI] Replace vllm main.py patch with sitecustomize.py by @sammshen in #3100
[CI] fix blend-server venv by @deng451e in #3099
[MP] Introduce MP runtime plugin framework by @maobaolong in #2956

New Contributors

@ianliuy made their first contribution in #3003

Full Changelog: v0.4.3...v0.4.4