LMCache v0.4.6 Release
Interface / Config / CLI / Build Changes
Breaking changes (action may be needed)
- Build env var
NO_CUDA_EXTrenamed toNO_NATIVE_EXT(legacy alias still works) - Observability metric names and units refactored — existing dashboards may break
- L2 adapter contract changed — now forces measuring real transferred bytes
- Internal module/API
non_cuda_equivalentsrenamed topython_ops_fallback
New / additive (opt-in)
- New
NO_GPU_EXT=1flag for CPU-only builds - MP adapter now accepts
extra_configat construction run_script_apinow supports MP-mode HTTP endpoint- MP connector now reports cache hit stats in
KVTransferParams - Redis now supports dynamic plugin registration for custom lookup priorities
- MP KV transfer now supports HND formats
- Restored
NO_CUDA_EXT=1skip-all-extensions semantics
MP (Multi-Process Mode)
- #3026 Add extra_config to construct lmcache mp adapter
- #3308 Lazy import cupy for gpu_cache_context
- #3259 Non-GPU Context by pickle
- #3243 Add l2 adapter benchmark cli
- #3336 Fix store/retrieve deadlock via C++ host callback
- #3363 Update L2 adapter interface to measure real transferred bytes
- #3391 Refactor MPCacheEngine for better extendability
- #3282 Support HND formats in MP KV transfer
- #3393 Force external LMCache MP connector path
- #3437 Restore MPCacheEngine HTTP-layer passthroughs
- #3398 run_script_api support mp mode http endpoint
- #3166 Add mp support for sglang
- #3402 Report cache hit stats in KVTransferParams
- #3365 Warn log while cannot reach lmcache server
Observability
- #3290 Refactor metric names and units
- #3320 Add L2 Usage to SM
- #3147 Fix PrometheusLogger duplicate metadata
CI/CD & Build
- #3297 Switch workflows to egress audit mode
- #3293 Date-based nightly tag and release-driven flow
- #3299 Add sm_120 to cu129 wheel build
- #3298 Pin vLLM CUDA wheel variant
- #3311 DCO sign-off commits from sync_torch_version
- #3312 Skip pipeline for operator-v* release tags
- #3315 Add signoff for doc translation workflow
- #3314 Fix incorrect version tagging
- #3349 Restore NO_CUDA_EXT=1 skip-all-extensions
- #3354 Rename NO_CUDA_EXT → NO_NATIVE_EXT
- #3357 CPU-only build (NO_GPU_EXT=1) + CI
- #3148 Add macos ci check
- #3366 Remove macos-13 from ci workflow
- #3358 Smoke-test container images before pushing
- #3405 Tighten threshold for k3 multiprocess test
- #3439 Flexible pod specs for k3 multiprocess test
- #3203 Add CI-safe raw-block temp-file tests
- #3415 Fix raw-block L2 store result assertions
- #3250 Move pytest.ini to project root
- #3348 Sync torch version with vLLM (2.11.0)
- #3400 Sync standalone image torch with vLLM (cu12.9 fix)
- #3302 Unconditionally compile common c++ extensions in setup.py
- #3390 Fix pytest lazy import issue from vLLM torch dynamo
Operator
Bugfixes
- #2999 Fix sha256_cbor + async_loading type mismatch in _hash_tokens
- #3252 Fix 0-hit async lookup when use_layerwise=true
- #3258 Prevent TypeError crash when streaming response has zero visible content
- #3191 Page-align pinned pointer allocations for O_DIRECT
- #3385 Skip unpin for non-pinned objects in cleanup_memory_objs
- #3416 Avoid vLLM import during blake3 token hasher startup
- #3355 Install CLI requirements in standalone image (fixes #3353)
- #3370 Use nixl meta-package on CUDA 13 so L2 adapters load
- #3294 Defer test_cache runtime imports so lmcache-cli loads without torch
- #3280 Add StubCPUDevice for CPU-only import/startup fallback
Features
- #2902 Add sycl implementation of memory_kernels for Intel XPU
- #2936 nixl_storage: naive support for files + dynamic
- #3351 Support dynamic plugin registration for Redis custom lookup priorities
- #3115 hipFile: cufile-python compatible shim for AMD's hipFile
- #2643 Huge pages support
Performance / Optimization
- #3271 Replace Condvar polling with eventfd + epoll in iouring worker
- #3338 Rename non_cuda_equivalents to python_ops_fallback
Refactor
- #3237 Abstract discover_subclasses util method
- #3341 Refactor bench engine cli
- #3411 Refactor bench kvcache cli
- #3379 Cleanup: remove duplicate import time in blend_server_v2
Benchmarking
- #3440 Fix bench close event loop
Docs
- #2282 Fix typo in p2p_init_ports parameter
- #3270 Add recipes for Mistral, Phi & Llama
- #3380 Add docstrings to public helpers in lmcache/utils.py
- #3395 Document gfx950 (MI350X/MI355X) in install example
- #3408 Add Qwen3-30B-A3B and Qwen3-235B-A22B to kv cache calculator
- #3450 Replace broken Mooncake GitHub links with docs site URLs
- #3432 Add top-level README as navigable examples catalog
- #3401 Combined PR from recent doc drift scannings
Chinese Translation
- #3265 Add Chinese translation pipeline
- #3310 Update Chinese documentation translations
- #3335 Update Chinese documentation translations
- #3334 Fix translation icon and screen layout adjustment
- #3436 Fix translate workflow reuse translations
nixl Storage
- #3200 Fix tests and remove from ignore list
Chore / Maintenance
- #3306 Fix typos and misspellings across codebase
- #3316 Add hlin99 to CODEOWNERS
- #3324 Add yoo-kumaneko as CODEOWNER for mp_observability and tools
- #3425 Correct spelling 'mignt' -> 'might' in cache policy comments
- #3426 Convert f-string log calls in cache_engine.py to %-format
New Contributors
- @ftian1 made their first contribution in #2902
- @abinggo made their first contribution in #3355
- @zhengchenyu made their first contribution in #3380
- @ynachiket made their first contribution in #3379
- @GentleCold made their first contribution in #2999
- @yeoshuheng made their first contribution in #3270
- @guy-ealey-morag made their first contribution in #3200
- @he-yufeng made their first contribution in #3282
- @riley-dixon made their first contribution in #3115
- @luceinaltis made their first contribution in #3252
- @pengxin99 made their first contribution in #3408
- @hyukjlee made their first contribution in #3395
- @Jah-yee made their first contribution in #3425
- @sahibpreetsingh12 made their first contribution in #3426
- @yangyonggit made their first contribution in #3432
- @annguy3n made their first contribution in #3351
- @Aionw made their first contribution in #3450
- @Javen-Ke made their first contribution in #3451