github yvgude/lean-ctx v3.8.14

latest release: v3.8.15
4 hours ago

Added

  • Write-time memory admission — dedup-merge + salience floor (gitlab #969/#970).
    A capped knowledge store used to fill with paraphrases of facts it already held,
    forcing eviction to drop a good fact to make room for a near-duplicate. The
    agent-facing ctx_knowledge remember path now runs a server-side admission gate
    (ProjectKnowledge::remember_admitted) before committing: a new value that is
    auto_merge_similarity (word-Jaccard, default 0.9) to an existing same
    category
    fact under a different key is merged into it (a confirmation bump, no
    new row), and a value whose content salience falls below min_salience (default
    0 = off, lossless) is rejected with a clear reason. Internal restorers (archive
    rehydrate, cognition auto-promotion) keep using the ungated remember, so
    admission only disciplines fresh agent writes. Same-key confirm/supersede
    (contradictions) is untouched. Tunable via [memory.admission] /
    LEAN_CTX_ADMISSION_{ENABLED,MERGE_SIMILARITY,MIN_SALIENCE}.
  • Cluster compaction — collapse low-value fact piles into recoverable digests
    (gitlab #969/#971).
    Decay + the cap kept a busy store churning at 100% but
    never actually shrank it. A new cognition-loop step (8c, hourly, lean-ctx-driven)
    collapses a same-category cluster of faded (< max_confidence), barely-confirmed
    (<= max_confirmations), never-frequently/recently-retrieved facts — at least
    min_cluster of them (default 4) — into a single content-addressed digest fact,
    archiving the originals so they rehydrate on recall. Digests and synthesized
    summaries are never re-compacted. The digest key/value are byte-stable functions
    of the cluster (#498). Surfaced as compacted= on the cognition-loop report.
    Tunable via [memory.compaction] / LEAN_CTX_COMPACTION_*; runs only in the
    background loop, never on the remember hot path.
  • Self-curating memory defaults + actionable capacity guidance (gitlab #969/#972).
    prune_unretrieved_after_days now defaults to a conservative, recoverable
    90 days (was off), so genuinely cold single-confirmation facts are archived
    instead of accumulating. lean-ctx doctor capacity warnings are no longer a dead
    end: a store at its cap now prints that this is healthy by design (eviction
    holds it there) and which lever to pull, while an over-cap CRIT tells the
    operator to run the cognition loop or raise the cap.
  • Read-cache re-delivery telemetry (gitlab #953). Turns the subjective
    "re-reads feel unreliable" signal into data: every event that drops a
    fully-delivered cache entry — forcing the next read to re-send the whole file
    instead of the cheap [unchanged] stub — increments a process-global counter
    grouped by cause (compaction, idle, eviction, conversation), surfaced
    as a re-deliveries forced: line in ctx_cache status. The counters live only
    in that diagnostic, never in a cacheable tool-output body, so output
    determinism (#498) is preserved. Pure measurement — no behavioral change.
  • Persistent, conversation-scoped [unchanged] stub index — survives daemon
    restarts and idle clears (gitlab #955).
    The in-memory read cache is wiped on
    every daemon restart and emptied by the idle-TTL clear, so until now the first
    unchanged re-read afterwards re-delivered the whole file — the single biggest
    remaining source of the "re-reads aren't reliable" feeling. A new focused
    module core::read_stub_index persists the minimal bookkeeping needed to emit
    the ~13-token stub — {path, md5, mtime, line_count, file_ref, delivered_conversation}, never the content — to
    {data_dir}/read_cache/stub_index.json (atomic tmp+rename, LRU-capped at 1024
    records). It is write-through on every full delivery, flushed on the
    batch/idle/shutdown save cadence, and rehydrated at startup, so a re-read of an
    unchanged file in the same conversation now collapses to the stub even across
    a restart. Correctness is gated harder than the warm path: a cold stub (no live
    entry) is served only when the file's mtime and md5 still match disk and
    the current conversation equals the delivering one
    (conversation::conversation_allows_cold_stub — no "no-context → legacy"
    escape, because across a process boundary an unknown conversation cannot prove
    the content is in context; this keeps #954's cross-chat hazard closed). A host
    compaction drops the whole index synchronously (the conversation's context was
    summarised away), mirroring SessionCache::reset_delivery_flags. Content is
    always re-read from disk — only delivery bookkeeping persists — so tool-output
    determinism (#498) is untouched. Side benefit: because the index outlives the
    idle clear, same-conversation re-reads after idle no longer re-deliver either.
    Kill-switch LEAN_CTX_STUB_PERSIST=0.
  • Deterministic JSON crusher core — core::json_crush (gitlab #934/#935,
    Headroom "Smart Crusher" port).
    Real JSON payloads (API responses, kubectl get -o json, DB dumps, RAG chunks) are dominated by arrays of objects that
    repeat the same keys and values on every row. The new single-source module
    factors that redundancy out: crush_lossless hoists every key present in all
    items of an array to its dominant value (a _defaults block) and keeps only
    per-item deviations, so it is exactly reconstructible via reconstruct;
    crush_lossy additionally records near-unique high-entropy columns
    (timestamps/UUIDs) in _dropped for out-of-band CCR recovery. Output is a pure
    function of the input Value — no timestamps, counters, randomness, or hash-map
    order leakage (candidate keys walk a BTreeSet, value frequencies a BTreeMap)
    — and it never inflates (a no-op returns None). This is the deterministic,
    byte-stable answer to Headroom's statistical crusher (#498).
  • Opt-in lossless JSON crushing for verbatim data commands (gitlab #936). A
    new crush_verbatim_json config key (env LEAN_CTX_CRUSH_VERBATIM_JSON, default
    off) lets the array-heavy JSON of otherwise byte-verbatim data commands
    (gh api, jq, kubectl get -o json, curl JSON) flow through the lossless
    crusher when it at least halves the payload. Off by default keeps those outputs
    verbatim; on, they are reshaped into a compact, fully reconstructible form and
    never lose a datum. The gate is a pure, unit-tested function and only ever
    touches Verbatim data commands — Passthrough (auth flows, dev servers,
    streaming) is never reshaped.
  • Active prompt-cache breakpoint injection for Anthropic (gitlab #939,
    Headroom "cache aligner" adjacent).
    A new opt-in cache_breakpoint proxy
    config key (env LEAN_CTX_PROXY_CACHE_BREAKPOINT, default off) makes the
    proxy add a single cache_control: {type:"ephemeral"} breakpoint to the
    system field of Anthropic requests only when the client set none of its
    own — so a raw API client's large, stable system prompt bills later turns at
    the cached rate instead of full price every turn (the cache win it left on the
    table). It is Anthropic-only by construction: OpenAI and Gemini cache prefixes
    automatically and ignore the marker, so those paths stay byte-unchanged. The
    injection is deterministic (a pure function of the body, so the prefix it
    creates is itself byte-stable, #498), never adds a second breakpoint (it defers
    to any client cache_control and to a client-cached message prefix), and is
    skipped below Anthropic's minimum cacheable size so it never churns bytes for no
    cache. It runs even on an otherwise meter-only/byte-passthrough proxy (the one
    sanctioned mutation), and every injection is counted on a dedicated
    breakpoints_injected gauge in /status cache_safety — a pure win signal,
    never against the cache-safe ratio.
  • Cache-aligner volatile-field telemetry (gitlab #940, Headroom "cache aligner"
    stage 1, telemetry-first).
    A single volatile token in an otherwise-stable
    system prompt — today's date, a fresh UUID, a git SHA — shifts the prefix bytes
    and busts the provider cache on every turn. A new opt-in cache_aligner proxy
    config key (env LEAN_CTX_PROXY_CACHE_ALIGNER, default off) makes the proxy
    scan each unanchored Anthropic system prompt for those fields and report how
    many it found on /status cache_safety (volatile_system_requests,
    volatile_fields_detected), so a user can quantify how much prompt-cache their
    prompt leaks. The scan is measurement only — the request body is never
    mutated, so it stays strictly cache-safe — and deterministic (matches are
    collected, sorted, and overlapping spans merged, so a full timestamp counts
    once). This is the honest precursor to an opt-in tail-relocate, which is
    deliberately deferred until the data shows it pays.
  • Retrieve-coupled CCR learning (gitlab #941, Headroom CCR "learning" port).
    When an agent keeps pulling back originals the inline compressed form dropped,
    that is direct evidence the compression was too aggressive. LoopDetector now
    tracks ctx_expand/ctx_retrieve re-fetches in a dedicated sliding-window
    counter (retrieve_count, alongside the existing correction counter), exposed
    as the ccr_retrieve_rate anomaly metric. The session auto-degrade now reacts
    to the stronger of the two pressures (correction loops and CCR retrieves) and
    recovers only when neither fires — so a session that over-retrieves dials
    compression down to Lite (>=3) then Off (>=5) for itself. The level is
    server state that feeds future CompressionLevel::effective() decisions, never
    part of any tool output body, so output determinism (#498) is preserved.
  • Model-free JSON-crush accuracy gate (gitlab #942). A new Condition::JsonCrush
    arm in the deterministic A/B eval harness (core::eval_ab) routes JSON/JSONL
    through json_crush instead of whitespace-only compaction, and a committed
    JSON-QA fixture (a redundant operator roster with one outlier field) plus the
    gate json_crush_condition_preserves_answer_and_beats_baseline prove — with no
    live model — that the crush keeps every gold answer while packing it in strictly
    fewer tokens than the raw baseline. This is the deterministic accuracy floor of
    the "crushed >= raw" claim, guarding against a future over-aggressive change.
  • Per-upstream proxy compression stats + ChatGPT Codex support (#582). The
    proxy /status and lean-ctx proxy status now break compression down per
    upstream — Anthropic, OpenAI, ChatGPT, Gemini — each with its own request /
    byte / token-saved counters, so you can see exactly where the savings come
    from. The split is purely additive: the existing top-level totals are
    unchanged, and an unknown label is still counted in the totals but never
    misattributed to a bucket. ChatGPT Codex traffic
    (/backend-api/codex/responses) is recorded under its own ChatGPT label
    while reusing the OpenAI Responses compression, usage, introspection and
    holdout paths, and JSON-encoded tool-result envelopes inside Responses output
    are now compressed/pruned without dropping items or breaking function_call /
    function_call_output pairing (shrink-only, respects should_protect). The
    research-prose squeeze cap is tunable via LEAN_CTX_RESEARCH_PROSE_CAP
    (default 20000). Thanks to community contributor @ousatov-ua.
  • Self-observability + self-curation tooling (gitlab #959–#964). A cluster of
    measurement-first additions that let lean-ctx report on — and tune — its own
    context footprint: a doctor injected-context linter plus a budget-gated
    per-session overhead report (#960/#964); a health per-tool value signal that
    recommends disabling tools that never earn their tokens (#961); knowledge-decay
    pruning and an ACTIVE-SESSION token budget so the injected session block stays
    bounded (#962); a shadow-minimal rules block that trims re-teaching (#963); and
    a deterministic footprint delta-eval harness for injected context (#959). All
    are diagnostic/state-only — no tool-output body changes — so output determinism
    (#498) is preserved.

Changed

  • json_schema::compress is now crush-backed (gitlab #936). The generic JSON
    fallback (and the jq route) prefers the lossless json_crush form over the
    value-dropping schema outline whenever the array is redundant enough to at least
    halve the payload — keeping every datum reconstructible instead of collapsing it
    to a structure-only sketch. Heterogeneous or low-redundancy arrays still fall
    through to the compact schema outline (unchanged), so there is no regression for
    those. curl's top-level array-of-objects path now defers to the same shared
    core instead of its useless [object(NK); N] summary, converging the generic
    JSON handling on one implementation (docker inspect and the aws
    resource summarizers stay intentionally domain-specific). PATTERN_ENGINE_VERSION
    is bumped (1→2) so determinism consumers detect the new output shape.
  • ctx_read aggressive mode compacts JSON structurally (gitlab #936). Reading
    a .json file in aggressive mode (the auto-resolved mode for large non-code
    data files) now routes redundant array-of-object payloads through the lossless
    json_crush core instead of generic text pruning, which mangles JSON structure.
    It fires only when the crush at least halves the file and shrinks the token
    count; the exact bytes stay recoverable with a full/raw re-read. map mode
    stays a compact structural overview (unchanged). The "must at least halve"
    gate is centralized in json_crush::{crush_value_if_beneficial, crush_text_if_beneficial} (one KEEP_DATA_DIVISOR), so the shell (json_schema,
    curl) and read paths can never drift.
  • Unified, surgical CCR retrieve path across the whole tee store (gitlab #938).
    ctx_expand now resolves every content-addressed original through one resolver
    with a fixed precedence: proxy prune/live stubs (proxy_<hash>), the JSON
    crusher's lossy originals (json_<hash>), AND every compressed shell command's
    already-teed verbatim output (<slug>_<8hex>.log) — before the reference
    (ref_) and archive (hex) stores. So an agent can pull back just the slice it
    needs (head/tail/search/json_path/range) from any of them instead of
    re-reading the whole file; the high-compression shell footer now advertises the
    ctx_expand slice form. The resolver trusts only the file name and always
    rebuilds the path under {state}/tee/ (no traversal). Opt-in verbatim JSON
    crushing (crush_verbatim_json) gains a lossy stage 2: when the lossless reshape
    does not pay, it drops near-unique high-entropy columns (timestamps, UUIDs) and
    persists the verbatim original under json_<hash>, embedding a content-addressed
    ctx_expand handle so a dropped datum is never irrecoverable.
  • ctx_search absorbs ctx_semantic_search and ctx_symbol (#509). Search
    collapses to a single action-routed ctx_search: an action argument
    (regex default, semantic, symbol, reindex, find_related) routes to
    the same engines as before, and a missing action is inferred so existing
    calls keep working. The two former tools become deprecated aliases — hidden
    from tools/list but still callable for one release — which trims the
    advertised surface (Standard 17→15 tools, Minimal 6→5) so a model picks the
    right search on the first try. Underlying search behavior is unchanged; this is
    the final step of the #509 read/search consolidation begun in 3.8.12/3.8.13.
  • Parallel BM25 index build and incremental rebuild (gitlab #933, #581). The
    full index build now tokenizes across a rayon pool and merges deterministically
    (#933); the edit-loop incremental rebuild — changed/new/removed files on a warm
    index — does the same (#581). Both paths are byte-for-byte identical to the
    sequential result (covered by determinism tests and a CI build-time regression
    gate), so first-index and reindex-after-edit are faster with no change to what
    search returns. Credit to the #581 reference work by @ousatov-ua.
  • Generated dependency lockfiles are excluded from the index (#585). npm/pnpm
    lockfiles (package-lock.json, npm-shrinkwrap.json, pnpm-lock.yaml) carry
    ingestible .json/.yaml extensions and used to slip into the index, where a
    retrieval surface (ctx_compose, BM25 search) would inline a large
    auto-generated dependency pin — a pure token sink. They are now dropped at the
    ingestion front-door via a new non-ingestible IngestKind::Generated, joining
    the *.lock/*.lockb files already excluded there (the scattered "lock"
    extension check is removed so detection lives in one place). Detection is by
    file name, so it is depth-independent — a monorepo's
    frontend/package-lock.json is caught too, unlike a root-anchored ignore glob.
    An explicit ctx_read/ctx_tree/ctx_glob of a lockfile is unaffected.

Fixed

  • CI on main was red on all three Test jobs — a stale source-grep test
    (gitlab #957).
    scenario_server_degrade_thresholds asserted the dispatch
    source literally contains("correction_count >= 5") etc.; the #941
    retrieve-coupled refactor renamed that to pressure = correction_count .max(retrieve_count), so the literals vanished and the assertion failed on
    every platform (the rest of CI stayed green). Replaced the brittle grep with a
    behavioral test backed by a new pure, total CompressionLevel::degrade_action
    (Set/Clear/Leave) extracted from the dispatch — runtime behavior is
    unchanged (5+ → Off, 3+ → Lite, 0 → clear, 1–2 → hold), but the threshold table
    is now unit-tested and immune to internal renames.
  • Subagents force-freshed every read, so re-reads were never cached inside a
    Task (gitlab #956, closes the #952 series).
    is_subagent_context() set
    effective_fresh = fresh || subagent, a blanket cold full read for the whole
    subagent run — safe (a subagent must not be served a stub for content only the
    parent received) but it threw away exactly the cheap [unchanged] re-read
    that #946/#954/#955 reclaimed. Now that the stub is conversation-scoped, the
    safety is enforced precisely instead of by bypass: a subagent runs under its
    own task:{CURSOR_TASK_ID} scope (conversation::current_conversation_id), so
    the stub gate withholds any stub the parent or a sibling delivered (distinct,
    non-None scope → never matches), while the subagent's own re-reads of an
    unchanged file collapse to the stub. The blanket force-fresh now applies only
    when scoping is off (LEAN_CTX_CONVERSATION_SCOPE=0); an explicit
    LEAN_CTX_FORCE_FRESH=1 still always forces fresh. Stubs stay double-gated
    (mtime+md5 vs disk and conversation match), so a subagent is only ever
    stubbed for a file it read itself, unchanged — never stale, never cross-agent.
  • auto-mode re-reads bypassed the [unchanged] cache stub and re-delivered
    the whole file (gitlab #946).
    The cheap ~13-token re-read stub
    (Fref=path [unchanged NL]) only fired for an explicit mode=full re-read;
    in the default auto mode a re-read of an unchanged, already fully-delivered
    file re-sent the entire body — the "re-reads aren't cached / reliability is
    worse than before" regression. Cause: ctx_read resolved auto with
    cache: None, so the resolver's unit-tested unchanged + full_delivered → ("full","cache_hit") short-circuit was dead code on the real read path (a
    silent divergence from ctx_smart_read, which threaded the cache correctly;
    introduced by the #683 deterministic cascade). resolve_auto_mode is now
    cache-aware, the warm path routes an autofull cache-hit through the same
    try_stub_hit_readonly stub as an explicit full re-read, and the registered
    read-lock fast path accepts auto too (self-guarded by the stub). Compressed-
    first files still serve their cached compressed output on re-read — no wrong
    escalation to full. Regression test
    auto_reread_of_fully_delivered_file_serves_unchanged_stub.
  • The [unchanged] re-read stub was not conversation-scoped — a file
    delivered in one chat could be stubbed for a re-read in another (gitlab #954).

    The read SessionCache is shared across every chat served by one daemon, but
    the stub asserts "you already have this in context" — true only within the
    conversation that received the full content. A re-read from a different chat on
    the same daemon could therefore receive Fref=path [unchanged NL] for content
    it never saw (the idle-TTL clear only incidentally masked it). Each entry now
    records the delivered_conversation (resolved from the live Cursor
    conversation_id that hooks write to active_transcript.json), and
    try_stub_hit_readonly serves the stub only when the current conversation
    matches; a mismatch re-delivers in full and is counted by the new re-delivery
    telemetry (#953). With no conversation context (hooks absent) it falls back to
    the legacy process-scoped behavior, so single-chat hit rates are unchanged and
    byte-stable (#498). The conversation gate is a pure, unit-tested function
    (conversation::conversation_allows_stub) injected into the stub path for
    deterministic, host-independent tests. Kill-switch
    LEAN_CTX_CONVERSATION_SCOPE=0.
  • ctx_impact missed Go and Kotlin same-package blast radius (#398 bug class).
    The C#/Java fix in 3.8.13 closed one instance of a general gap: any language with
    implicit same-package visibility references project types with no import, so
    import edges alone leave the consumed type a false-negative leaf. For Go the
    miss was total — same-package is same-directory and fully import-free, so changing
    a struct used by a sibling file reported "no impact". core::type_ref_edges now
    resolves Go usages directory-scoped and strict (a common name like
    Config/Server declared in many packages still resolves to the one true
    same-package definer, with no cross-package leak) and Kotlin usages by
    declared package, both durable through the graph_index mirror and emitted by the
    ctx_impact builder. The old coarse Go package heuristic — one arbitrary
    same-directory edge per file, silently parsed as a top-weight imports edge in
    the mirror — is removed: it both missed the real consumer and pulled
    non-consumers (e.g. an unrelated logger.go) into the blast radius. Precise
    type_ref edges replace it, and a genuinely unused file now falls to the standard
    low-weight sibling rescue like every other language. Per-language scope is
    centralized in one resolve_scope (previously the namespace logic was duplicated
    across three call sites). GRAPH_ENGINE_VERSION is bumped (3→4) so stale graphs
    self-heal. (gitlab #920–#924)
  • Project-root resolution unified for search and the MCP path jail (#580,
    #948).
    An index built at the git root but searched from a sub-directory
    resolved to a different namespace hash and returned zero hits; separately, an
    MCP server launched from an agent-config directory (.copilot / .cursor /
    .windsurf / .gemini) adopted that directory as the project root and then
    rejected in-tree reads with "path escapes project root". A single
    git-promotion resolver is now the one source of truth for the root, an explicit
    sub-directory becomes a result filter rather than its own namespace, and an
    agent-config CWD auto-reroots to the real project. PathJail enforcement is
    unchanged — only root derivation is corrected. Adopted from reference PR #581
    by @ousatov-ua.
  • lean-ctx call ctx_tools … panicked on the CLI call path (#583). Invoking
    the ctx_tools meta-tool from the CLI crashed with "there is no reactor
    running" because the runtime was resolved via Handle::current(), which only
    exists on the MCP path (handlers there run inside block_in_place). It now
    uses Handle::try_current(): the ambient handle is reused on the MCP path and
    a one-shot runtime is built on the CLI path. Pure control-flow fix — MCP
    behavior and output bytes are unchanged.
  • ctx_shell could silently drop output when a child held the pipe open
    (gitlab #945).
    A process that kept the write end of the pipe open past its
    own exit truncated the captured output; the reader now drains to EOF so the
    full output is compressed and returned.
  • lean-ctx update failed with UnknownIssuer behind TLS-inspecting proxies
    (#578).
    The updater now validates TLS against the OS trust store via ureq's
    PlatformVerifier, so corporate roots installed in the system keychain/store
    are honored.
  • gain --deep reported "Daemon: offline" on Windows while the daemon was
    running (#576).
    The footer's daemon-status probe used a Unix-only check; it
    now reports the daemon state correctly on Windows too.

Upgrade

lean-ctx update                 # recommended (auto-downloads + refreshes shell hooks)
cargo install lean-ctx          # or
npm update -g lean-ctx-bin      # or
brew upgrade lean-ctx

Note: After upgrading via cargo/npm/brew, run lean-ctx setup to refresh shell aliases. lean-ctx update does this automatically.

Full Changelog: v3.8.14...v3.8.14

Don't miss a new lean-ctx release

NewReleases is sending notifications on new releases.