github LMCache/LMCache v0.4.6

latest releases: v0.4.8rc1-cu129, v0.4.8rc1, nightly-cu129...
18 days ago

LMCache v0.4.6 Release

Interface / Config / CLI / Build Changes

Breaking changes (action may be needed)

  • Build env var NO_CUDA_EXT renamed to NO_NATIVE_EXT (legacy alias still works)
  • Observability metric names and units refactored — existing dashboards may break
  • L2 adapter contract changed — now forces measuring real transferred bytes
  • Internal module/API non_cuda_equivalents renamed to python_ops_fallback

New / additive (opt-in)

  • New NO_GPU_EXT=1 flag for CPU-only builds
  • MP adapter now accepts extra_config at construction
  • run_script_api now supports MP-mode HTTP endpoint
  • MP connector now reports cache hit stats in KVTransferParams
  • Redis now supports dynamic plugin registration for custom lookup priorities
  • MP KV transfer now supports HND formats
  • Restored NO_CUDA_EXT=1 skip-all-extensions semantics

MP (Multi-Process Mode)

  • #3026 Add extra_config to construct lmcache mp adapter
  • #3308 Lazy import cupy for gpu_cache_context
  • #3259 Non-GPU Context by pickle
  • #3243 Add l2 adapter benchmark cli
  • #3336 Fix store/retrieve deadlock via C++ host callback
  • #3363 Update L2 adapter interface to measure real transferred bytes
  • #3391 Refactor MPCacheEngine for better extendability
  • #3282 Support HND formats in MP KV transfer
  • #3393 Force external LMCache MP connector path
  • #3437 Restore MPCacheEngine HTTP-layer passthroughs
  • #3398 run_script_api support mp mode http endpoint
  • #3166 Add mp support for sglang
  • #3402 Report cache hit stats in KVTransferParams
  • #3365 Warn log while cannot reach lmcache server

Observability

  • #3290 Refactor metric names and units
  • #3320 Add L2 Usage to SM
  • #3147 Fix PrometheusLogger duplicate metadata

CI/CD & Build

  • #3297 Switch workflows to egress audit mode
  • #3293 Date-based nightly tag and release-driven flow
  • #3299 Add sm_120 to cu129 wheel build
  • #3298 Pin vLLM CUDA wheel variant
  • #3311 DCO sign-off commits from sync_torch_version
  • #3312 Skip pipeline for operator-v* release tags
  • #3315 Add signoff for doc translation workflow
  • #3314 Fix incorrect version tagging
  • #3349 Restore NO_CUDA_EXT=1 skip-all-extensions
  • #3354 Rename NO_CUDA_EXT → NO_NATIVE_EXT
  • #3357 CPU-only build (NO_GPU_EXT=1) + CI
  • #3148 Add macos ci check
  • #3366 Remove macos-13 from ci workflow
  • #3358 Smoke-test container images before pushing
  • #3405 Tighten threshold for k3 multiprocess test
  • #3439 Flexible pod specs for k3 multiprocess test
  • #3203 Add CI-safe raw-block temp-file tests
  • #3415 Fix raw-block L2 store result assertions
  • #3250 Move pytest.ini to project root
  • #3348 Sync torch version with vLLM (2.11.0)
  • #3400 Sync standalone image torch with vLLM (cu12.9 fix)
  • #3302 Unconditionally compile common c++ extensions in setup.py
  • #3390 Fix pytest lazy import issue from vLLM torch dynamo

Operator

  • #3293 Date-based nightly tag and release-driven flow
  • #3289 Operator end-to-end smoke suite

Bugfixes

  • #2999 Fix sha256_cbor + async_loading type mismatch in _hash_tokens
  • #3252 Fix 0-hit async lookup when use_layerwise=true
  • #3258 Prevent TypeError crash when streaming response has zero visible content
  • #3191 Page-align pinned pointer allocations for O_DIRECT
  • #3385 Skip unpin for non-pinned objects in cleanup_memory_objs
  • #3416 Avoid vLLM import during blake3 token hasher startup
  • #3355 Install CLI requirements in standalone image (fixes #3353)
  • #3370 Use nixl meta-package on CUDA 13 so L2 adapters load
  • #3294 Defer test_cache runtime imports so lmcache-cli loads without torch
  • #3280 Add StubCPUDevice for CPU-only import/startup fallback

Features

  • #2902 Add sycl implementation of memory_kernels for Intel XPU
  • #2936 nixl_storage: naive support for files + dynamic
  • #3351 Support dynamic plugin registration for Redis custom lookup priorities
  • #3115 hipFile: cufile-python compatible shim for AMD's hipFile
  • #2643 Huge pages support

Performance / Optimization

  • #3271 Replace Condvar polling with eventfd + epoll in iouring worker
  • #3338 Rename non_cuda_equivalents to python_ops_fallback

Refactor

  • #3237 Abstract discover_subclasses util method
  • #3341 Refactor bench engine cli
  • #3411 Refactor bench kvcache cli
  • #3379 Cleanup: remove duplicate import time in blend_server_v2

Benchmarking

  • #3440 Fix bench close event loop

Docs

  • #2282 Fix typo in p2p_init_ports parameter
  • #3270 Add recipes for Mistral, Phi & Llama
  • #3380 Add docstrings to public helpers in lmcache/utils.py
  • #3395 Document gfx950 (MI350X/MI355X) in install example
  • #3408 Add Qwen3-30B-A3B and Qwen3-235B-A22B to kv cache calculator
  • #3450 Replace broken Mooncake GitHub links with docs site URLs
  • #3432 Add top-level README as navigable examples catalog
  • #3401 Combined PR from recent doc drift scannings

Chinese Translation

  • #3265 Add Chinese translation pipeline
  • #3310 Update Chinese documentation translations
  • #3335 Update Chinese documentation translations
  • #3334 Fix translation icon and screen layout adjustment
  • #3436 Fix translate workflow reuse translations

nixl Storage

  • #3200 Fix tests and remove from ignore list

Chore / Maintenance

  • #3306 Fix typos and misspellings across codebase
  • #3316 Add hlin99 to CODEOWNERS
  • #3324 Add yoo-kumaneko as CODEOWNER for mp_observability and tools
  • #3425 Correct spelling 'mignt' -> 'might' in cache policy comments
  • #3426 Convert f-string log calls in cache_engine.py to %-format

New Contributors

Don't miss a new LMCache release

NewReleases is sending notifications on new releases.