yvgude/lean-ctx v3.8.14 on GitHub

Added

Write-time memory admission — dedup-merge + salience floor (gitlab #969/#970).
A capped knowledge store used to fill with paraphrases of facts it already held,
forcing eviction to drop a good fact to make room for a near-duplicate. The
agent-facing ctx_knowledge remember path now runs a server-side admission gate
(ProjectKnowledge::remember_admitted) before committing: a new value that is
≥ auto_merge_similarity (word-Jaccard, default 0.9) to an existing same
category fact under a different key is merged into it (a confirmation bump, no
new row), and a value whose content salience falls below min_salience (default
0 = off, lossless) is rejected with a clear reason. Internal restorers (archive
rehydrate, cognition auto-promotion) keep using the ungated remember, so
admission only disciplines fresh agent writes. Same-key confirm/supersede
(contradictions) is untouched. Tunable via [memory.admission] /
LEAN_CTX_ADMISSION_{ENABLED,MERGE_SIMILARITY,MIN_SALIENCE}.
Cluster compaction — collapse low-value fact piles into recoverable digests
(gitlab #969/#971). Decay + the cap kept a busy store churning at 100% but
never actually shrank it. A new cognition-loop step (8c, hourly, lean-ctx-driven)
collapses a same-category cluster of faded (< max_confidence), barely-confirmed
(<= max_confirmations), never-frequently/recently-retrieved facts — at least
min_cluster of them (default 4) — into a single content-addressed digest fact,
archiving the originals so they rehydrate on recall. Digests and synthesized
summaries are never re-compacted. The digest key/value are byte-stable functions
of the cluster (#498). Surfaced as compacted= on the cognition-loop report.
Tunable via [memory.compaction] / LEAN_CTX_COMPACTION_*; runs only in the
background loop, never on the remember hot path.
Self-curating memory defaults + actionable capacity guidance (gitlab #969/#972).
prune_unretrieved_after_days now defaults to a conservative, recoverable
90 days (was off), so genuinely cold single-confirmation facts are archived
instead of accumulating. lean-ctx doctor capacity warnings are no longer a dead
end: a store at its cap now prints that this is healthy by design (eviction
holds it there) and which lever to pull, while an over-cap CRIT tells the
operator to run the cognition loop or raise the cap.
Read-cache re-delivery telemetry (gitlab #953). Turns the subjective
"re-reads feel unreliable" signal into data: every event that drops a
fully-delivered cache entry — forcing the next read to re-send the whole file
instead of the cheap [unchanged] stub — increments a process-global counter
grouped by cause (compaction, idle, eviction, conversation), surfaced
as a re-deliveries forced: line in ctx_cache status. The counters live only
in that diagnostic, never in a cacheable tool-output body, so output
determinism (#498) is preserved. Pure measurement — no behavioral change.
Persistent, conversation-scoped [unchanged] stub index — survives daemon
restarts and idle clears (gitlab #955). The in-memory read cache is wiped on
every daemon restart and emptied by the idle-TTL clear, so until now the first
unchanged re-read afterwards re-delivered the whole file — the single biggest
remaining source of the "re-reads aren't reliable" feeling. A new focused
module core::read_stub_index persists the minimal bookkeeping needed to emit
the ~13-token stub — {path, md5, mtime, line_count, file_ref, delivered_conversation}, never the content — to
{data_dir}/read_cache/stub_index.json (atomic tmp+rename, LRU-capped at 1024
records). It is write-through on every full delivery, flushed on the
batch/idle/shutdown save cadence, and rehydrated at startup, so a re-read of an
unchanged file in the same conversation now collapses to the stub even across
a restart. Correctness is gated harder than the warm path: a cold stub (no live
entry) is served only when the file's mtime and md5 still match disk and
the current conversation equals the delivering one
(conversation::conversation_allows_cold_stub — no "no-context → legacy"
escape, because across a process boundary an unknown conversation cannot prove
the content is in context; this keeps #954's cross-chat hazard closed). A host
compaction drops the whole index synchronously (the conversation's context was
summarised away), mirroring SessionCache::reset_delivery_flags. Content is
always re-read from disk — only delivery bookkeeping persists — so tool-output
determinism (#498) is untouched. Side benefit: because the index outlives the
idle clear, same-conversation re-reads after idle no longer re-deliver either.
Kill-switch LEAN_CTX_STUB_PERSIST=0.
Deterministic JSON crusher core — core::json_crush (gitlab #934/#935,
Headroom "Smart Crusher" port). Real JSON payloads (API responses, kubectl get -o json, DB dumps, RAG chunks) are dominated by arrays of objects that
repeat the same keys and values on every row. The new single-source module
factors that redundancy out: crush_lossless hoists every key present in all
items of an array to its dominant value (a _defaults block) and keeps only
per-item deviations, so it is exactly reconstructible via reconstruct;
crush_lossy additionally records near-unique high-entropy columns
(timestamps/UUIDs) in _dropped for out-of-band CCR recovery. Output is a pure
function of the input Value — no timestamps, counters, randomness, or hash-map
order leakage (candidate keys walk a BTreeSet, value frequencies a BTreeMap)
— and it never inflates (a no-op returns None). This is the deterministic,
byte-stable answer to Headroom's statistical crusher (#498).
Opt-in lossless JSON crushing for verbatim data commands (gitlab #936). A
new crush_verbatim_json config key (env LEAN_CTX_CRUSH_VERBATIM_JSON, default
off) lets the array-heavy JSON of otherwise byte-verbatim data commands
(gh api, jq, kubectl get -o json, curl JSON) flow through the lossless
crusher when it at least halves the payload. Off by default keeps those outputs
verbatim; on, they are reshaped into a compact, fully reconstructible form and
never lose a datum. The gate is a pure, unit-tested function and only ever
touches Verbatim data commands — Passthrough (auth flows, dev servers,
streaming) is never reshaped.
Active prompt-cache breakpoint injection for Anthropic (gitlab #939,
Headroom "cache aligner" adjacent). A new opt-in cache_breakpoint proxy
config key (env LEAN_CTX_PROXY_CACHE_BREAKPOINT, default off) makes the
proxy add a single cache_control: {type:"ephemeral"} breakpoint to the
system field of Anthropic requests only when the client set none of its
own — so a raw API client's large, stable system prompt bills later turns at
the cached rate instead of full price every turn (the cache win it left on the
table). It is Anthropic-only by construction: OpenAI and Gemini cache prefixes
automatically and ignore the marker, so those paths stay byte-unchanged. The
injection is deterministic (a pure function of the body, so the prefix it
creates is itself byte-stable, #498), never adds a second breakpoint (it defers
to any client cache_control and to a client-cached message prefix), and is
skipped below Anthropic's minimum cacheable size so it never churns bytes for no
cache. It runs even on an otherwise meter-only/byte-passthrough proxy (the one
sanctioned mutation), and every injection is counted on a dedicated
breakpoints_injected gauge in /status cache_safety — a pure win signal,
never against the cache-safe ratio.
Cache-aligner volatile-field telemetry (gitlab #940, Headroom "cache aligner"
stage 1, telemetry-first). A single volatile token in an otherwise-stable
system prompt — today's date, a fresh UUID, a git SHA — shifts the prefix bytes
and busts the provider cache on every turn. A new opt-in cache_aligner proxy
config key (env LEAN_CTX_PROXY_CACHE_ALIGNER, default off) makes the proxy
scan each unanchored Anthropic system prompt for those fields and report how
many it found on /status cache_safety (volatile_system_requests,
volatile_fields_detected), so a user can quantify how much prompt-cache their
prompt leaks. The scan is measurement only — the request body is never
mutated, so it stays strictly cache-safe — and deterministic (matches are
collected, sorted, and overlapping spans merged, so a full timestamp counts
once). This is the honest precursor to an opt-in tail-relocate, which is
deliberately deferred until the data shows it pays.
Retrieve-coupled CCR learning (gitlab #941, Headroom CCR "learning" port).
When an agent keeps pulling back originals the inline compressed form dropped,
that is direct evidence the compression was too aggressive. LoopDetector now
tracks ctx_expand/ctx_retrieve re-fetches in a dedicated sliding-window
counter (retrieve_count, alongside the existing correction counter), exposed
as the ccr_retrieve_rate anomaly metric. The session auto-degrade now reacts
to the stronger of the two pressures (correction loops and CCR retrieves) and
recovers only when neither fires — so a session that over-retrieves dials
compression down to Lite (>=3) then Off (>=5) for itself. The level is
server state that feeds future CompressionLevel::effective() decisions, never
part of any tool output body, so output determinism (#498) is preserved.
Model-free JSON-crush accuracy gate (gitlab #942). A new Condition::JsonCrush
arm in the deterministic A/B eval harness (core::eval_ab) routes JSON/JSONL
through json_crush instead of whitespace-only compaction, and a committed
JSON-QA fixture (a redundant operator roster with one outlier field) plus the
gate json_crush_condition_preserves_answer_and_beats_baseline prove — with no
live model — that the crush keeps every gold answer while packing it in strictly
fewer tokens than the raw baseline. This is the deterministic accuracy floor of
the "crushed >= raw" claim, guarding against a future over-aggressive change.
Per-upstream proxy compression stats + ChatGPT Codex support (#582). The
proxy /status and lean-ctx proxy status now break compression down per
upstream — Anthropic, OpenAI, ChatGPT, Gemini — each with its own request /
byte / token-saved counters, so you can see exactly where the savings come
from. The split is purely additive: the existing top-level totals are
unchanged, and an unknown label is still counted in the totals but never
misattributed to a bucket. ChatGPT Codex traffic
(/backend-api/codex/responses) is recorded under its own ChatGPT label
while reusing the OpenAI Responses compression, usage, introspection and
holdout paths, and JSON-encoded tool-result envelopes inside Responses output
are now compressed/pruned without dropping items or breaking function_call /
function_call_output pairing (shrink-only, respects should_protect). The
research-prose squeeze cap is tunable via LEAN_CTX_RESEARCH_PROSE_CAP
(default 20000). Thanks to community contributor @ousatov-ua.
Self-observability + self-curation tooling (gitlab #959–#964). A cluster of
measurement-first additions that let lean-ctx report on — and tune — its own
context footprint: a doctor injected-context linter plus a budget-gated
per-session overhead report (#960/#964); a health per-tool value signal that
recommends disabling tools that never earn their tokens (#961); knowledge-decay
pruning and an ACTIVE-SESSION token budget so the injected session block stays
bounded (#962); a shadow-minimal rules block that trims re-teaching (#963); and
a deterministic footprint delta-eval harness for injected context (#959). All
are diagnostic/state-only — no tool-output body changes — so output determinism
(#498) is preserved.

Changed

json_schema::compress is now crush-backed (gitlab #936). The generic JSON
fallback (and the jq route) prefers the lossless json_crush form over the
value-dropping schema outline whenever the array is redundant enough to at least
halve the payload — keeping every datum reconstructible instead of collapsing it
to a structure-only sketch. Heterogeneous or low-redundancy arrays still fall
through to the compact schema outline (unchanged), so there is no regression for
those. curl's top-level array-of-objects path now defers to the same shared
core instead of its useless [object(NK); N] summary, converging the generic
JSON handling on one implementation (docker inspect and the aws
resource summarizers stay intentionally domain-specific). PATTERN_ENGINE_VERSION
is bumped (1→2) so determinism consumers detect the new output shape.
ctx_read aggressive mode compacts JSON structurally (gitlab #936). Reading
a .json file in aggressive mode (the auto-resolved mode for large non-code
data files) now routes redundant array-of-object payloads through the lossless
json_crush core instead of generic text pruning, which mangles JSON structure.
It fires only when the crush at least halves the file and shrinks the token
count; the exact bytes stay recoverable with a full/raw re-read. map mode
stays a compact structural overview (unchanged). The "must at least halve"
gate is centralized in json_crush::{crush_value_if_beneficial, crush_text_if_beneficial} (one KEEP_DATA_DIVISOR), so the shell (json_schema,
curl) and read paths can never drift.
Unified, surgical CCR retrieve path across the whole tee store (gitlab #938).
ctx_expand now resolves every content-addressed original through one resolver
with a fixed precedence: proxy prune/live stubs (proxy_<hash>), the JSON
crusher's lossy originals (json_<hash>), AND every compressed shell command's
already-teed verbatim output (<slug>_<8hex>.log) — before the reference
(ref_) and archive (hex) stores. So an agent can pull back just the slice it
needs (head/tail/search/json_path/range) from any of them instead of
re-reading the whole file; the high-compression shell footer now advertises the
ctx_expand slice form. The resolver trusts only the file name and always
rebuilds the path under {state}/tee/ (no traversal). Opt-in verbatim JSON
crushing (crush_verbatim_json) gains a lossy stage 2: when the lossless reshape
does not pay, it drops near-unique high-entropy columns (timestamps, UUIDs) and
persists the verbatim original under json_<hash>, embedding a content-addressed
ctx_expand handle so a dropped datum is never irrecoverable.
ctx_search absorbs ctx_semantic_search and ctx_symbol (#509). Search
collapses to a single action-routed ctx_search: an action argument
(regex default, semantic, symbol, reindex, find_related) routes to
the same engines as before, and a missing action is inferred so existing
calls keep working. The two former tools become deprecated aliases — hidden
from tools/list but still callable for one release — which trims the
advertised surface (Standard 17→15 tools, Minimal 6→5) so a model picks the
right search on the first try. Underlying search behavior is unchanged; this is
the final step of the #509 read/search consolidation begun in 3.8.12/3.8.13.
Parallel BM25 index build and incremental rebuild (gitlab #933, #581). The
full index build now tokenizes across a rayon pool and merges deterministically
(#933); the edit-loop incremental rebuild — changed/new/removed files on a warm
index — does the same (#581). Both paths are byte-for-byte identical to the
sequential result (covered by determinism tests and a CI build-time regression
gate), so first-index and reindex-after-edit are faster with no change to what
search returns. Credit to the #581 reference work by @ousatov-ua.
Generated dependency lockfiles are excluded from the index (#585). npm/pnpm
lockfiles (package-lock.json, npm-shrinkwrap.json, pnpm-lock.yaml) carry
ingestible .json/.yaml extensions and used to slip into the index, where a
retrieval surface (ctx_compose, BM25 search) would inline a large
auto-generated dependency pin — a pure token sink. They are now dropped at the
ingestion front-door via a new non-ingestible IngestKind::Generated, joining
the *.lock/*.lockb files already excluded there (the scattered "lock"
extension check is removed so detection lives in one place). Detection is by
file name, so it is depth-independent — a monorepo's
frontend/package-lock.json is caught too, unlike a root-anchored ignore glob.
An explicit ctx_read/ctx_tree/ctx_glob of a lockfile is unaffected.

Fixed

CI on main was red on all three Test jobs — a stale source-grep test
(gitlab #957). scenario_server_degrade_thresholds asserted the dispatch
source literally contains("correction_count >= 5") etc.; the #941
retrieve-coupled refactor renamed that to pressure = correction_count .max(retrieve_count), so the literals vanished and the assertion failed on
every platform (the rest of CI stayed green). Replaced the brittle grep with a
behavioral test backed by a new pure, total CompressionLevel::degrade_action
(Set/Clear/Leave) extracted from the dispatch — runtime behavior is
unchanged (5+ → Off, 3+ → Lite, 0 → clear, 1–2 → hold), but the threshold table
is now unit-tested and immune to internal renames.
Subagents force-freshed every read, so re-reads were never cached inside a
Task (gitlab #956, closes the #952 series). is_subagent_context() set
effective_fresh = fresh || subagent, a blanket cold full read for the whole
subagent run — safe (a subagent must not be served a stub for content only the
parent received) but it threw away exactly the cheap [unchanged] re-read
that #946/#954/#955 reclaimed. Now that the stub is conversation-scoped, the
safety is enforced precisely instead of by bypass: a subagent runs under its
own task:{CURSOR_TASK_ID} scope (conversation::current_conversation_id), so
the stub gate withholds any stub the parent or a sibling delivered (distinct,
non-None scope → never matches), while the subagent's own re-reads of an
unchanged file collapse to the stub. The blanket force-fresh now applies only
when scoping is off (LEAN_CTX_CONVERSATION_SCOPE=0); an explicit
LEAN_CTX_FORCE_FRESH=1 still always forces fresh. Stubs stay double-gated
(mtime+md5 vs disk and conversation match), so a subagent is only ever
stubbed for a file it read itself, unchanged — never stale, never cross-agent.
auto-mode re-reads bypassed the [unchanged] cache stub and re-delivered
the whole file (gitlab #946). The cheap ~13-token re-read stub
(Fref=path [unchanged NL]) only fired for an explicit mode=full re-read;
in the default auto mode a re-read of an unchanged, already fully-delivered
file re-sent the entire body — the "re-reads aren't cached / reliability is
worse than before" regression. Cause: ctx_read resolved auto with
cache: None, so the resolver's unit-tested unchanged + full_delivered → ("full","cache_hit") short-circuit was dead code on the real read path (a
silent divergence from ctx_smart_read, which threaded the cache correctly;
introduced by the #683 deterministic cascade). resolve_auto_mode is now
cache-aware, the warm path routes an auto→full cache-hit through the same
try_stub_hit_readonly stub as an explicit full re-read, and the registered
read-lock fast path accepts auto too (self-guarded by the stub). Compressed-
first files still serve their cached compressed output on re-read — no wrong
escalation to full. Regression test
auto_reread_of_fully_delivered_file_serves_unchanged_stub.
The [unchanged] re-read stub was not conversation-scoped — a file
delivered in one chat could be stubbed for a re-read in another (gitlab #954).
The read SessionCache is shared across every chat served by one daemon, but
the stub asserts "you already have this in context" — true only within the
conversation that received the full content. A re-read from a different chat on
the same daemon could therefore receive Fref=path [unchanged NL] for content
it never saw (the idle-TTL clear only incidentally masked it). Each entry now
records the delivered_conversation (resolved from the live Cursor
conversation_id that hooks write to active_transcript.json), and
try_stub_hit_readonly serves the stub only when the current conversation
matches; a mismatch re-delivers in full and is counted by the new re-delivery
telemetry (#953). With no conversation context (hooks absent) it falls back to
the legacy process-scoped behavior, so single-chat hit rates are unchanged and
byte-stable (#498). The conversation gate is a pure, unit-tested function
(conversation::conversation_allows_stub) injected into the stub path for
deterministic, host-independent tests. Kill-switch
LEAN_CTX_CONVERSATION_SCOPE=0.
ctx_impact missed Go and Kotlin same-package blast radius (#398 bug class).
The C#/Java fix in 3.8.13 closed one instance of a general gap: any language with
implicit same-package visibility references project types with no import, so
import edges alone leave the consumed type a false-negative leaf. For Go the
miss was total — same-package is same-directory and fully import-free, so changing
a struct used by a sibling file reported "no impact". core::type_ref_edges now
resolves Go usages directory-scoped and strict (a common name like
Config/Server declared in many packages still resolves to the one true
same-package definer, with no cross-package leak) and Kotlin usages by
declared package, both durable through the graph_index mirror and emitted by the
ctx_impact builder. The old coarse Go package heuristic — one arbitrary
same-directory edge per file, silently parsed as a top-weight imports edge in
the mirror — is removed: it both missed the real consumer and pulled
non-consumers (e.g. an unrelated logger.go) into the blast radius. Precise
type_ref edges replace it, and a genuinely unused file now falls to the standard
low-weight sibling rescue like every other language. Per-language scope is
centralized in one resolve_scope (previously the namespace logic was duplicated
across three call sites). GRAPH_ENGINE_VERSION is bumped (3→4) so stale graphs
self-heal. (gitlab #920–#924)
Project-root resolution unified for search and the MCP path jail (#580,
#948). An index built at the git root but searched from a sub-directory
resolved to a different namespace hash and returned zero hits; separately, an
MCP server launched from an agent-config directory (.copilot / .cursor /
.windsurf / .gemini) adopted that directory as the project root and then
rejected in-tree reads with "path escapes project root". A single
git-promotion resolver is now the one source of truth for the root, an explicit
sub-directory becomes a result filter rather than its own namespace, and an
agent-config CWD auto-reroots to the real project. PathJail enforcement is
unchanged — only root derivation is corrected. Adopted from reference PR #581
by @ousatov-ua.
lean-ctx call ctx_tools … panicked on the CLI call path (#583). Invoking
the ctx_tools meta-tool from the CLI crashed with "there is no reactor
running" because the runtime was resolved via Handle::current(), which only
exists on the MCP path (handlers there run inside block_in_place). It now
uses Handle::try_current(): the ambient handle is reused on the MCP path and
a one-shot runtime is built on the CLI path. Pure control-flow fix — MCP
behavior and output bytes are unchanged.
ctx_shell could silently drop output when a child held the pipe open
(gitlab #945). A process that kept the write end of the pipe open past its
own exit truncated the captured output; the reader now drains to EOF so the
full output is compressed and returned.
lean-ctx update failed with UnknownIssuer behind TLS-inspecting proxies
(#578). The updater now validates TLS against the OS trust store via ureq's
PlatformVerifier, so corporate roots installed in the system keychain/store
are honored.
gain --deep reported "Daemon: offline" on Windows while the daemon was
running (#576). The footer's daemon-status probe used a Unix-only check; it
now reports the daemon state correctly on Windows too.

Upgrade

lean-ctx update                 # recommended (auto-downloads + refreshes shell hooks)
cargo install lean-ctx          # or
npm update -g lean-ctx-bin      # or
brew upgrade lean-ctx

Note: After upgrading via cargo/npm/brew, run lean-ctx setup to refresh shell aliases. lean-ctx update does this automatically.

Full Changelog: v3.8.14...v3.8.14