[2.5.3] - 2026-05-28
Features
qmd getnow accepts a:from:countsuffix on a path or docid (e.g.
qmd get "#abc123:120:40"reads 40 lines starting at line 120). Explicit
--from/-lflags still override the suffix. The MCPgettool accepts the
same suffix.qmd getandqmd multi-getare now line-numbered by default and print
the document's#docidandqmd://path in the output header. Disable line
numbers with--no-line-numbers. The MCPget/multi_gettools default
lineNumberstotrueto match.qmd multi-getnow includes the#docidin every output format
(--md,--json,--csv,--xml,--files, and the default CLI view),
consistent withqmd search.qmd getandqmd multi-getaccept--full-path, which replaces the
qmd://path +#docidwith the document's on-disk filesystem path (handy for
piping intoRead/Edit/an editor). Falls back to the canonicalqmd://+
docid header when the file no longer exists on disk.qmd search/qmd querynow show a clearer hit identifier: the default CLI
view (and the new**file:**line in--mdoutput) always prints the full
qmd://collection/pathURI so you can pipe it straight back intoqmd get.qmd search/qmd queryaccept--full-pathwith the same semantics as
qmd get: the result label becomes the file's on-disk path —./-prefixed
relative path when the file lives in a subfolder of$PWD, absolute realpath
otherwise — and the per-result#docidis dropped because the path is the
identifier. The leading./is intentional so the output is unambiguously a
filesystem path. Applies to all output formats.qmd getandqmd multi-getnow also use the./-prefixed convention when
--full-pathrenders a path under$PWD, matchingsearch/query.- New
--format <kind>flag selects the output format (cli|json|csv|
md|xml|files) forsearch,query, andmulti-get. The legacy
boolean aliases (--json/--csv/--md/--xml/--files) still work but are
no longer in--help; prefer--format.
Fixes
- Launcher: source-mode runner selection now prefers Node + tsx over Bun when
bothpackage-lock.jsonandbun.lockare present in the package root,
mirroring the dist-mode "npm priority" rule. Fixes pnpm-global installs that
copy the entire working tree (including.gitandbun.lock) into the
install dir and previously routed through Bun, causing ABI mismatches with
the Node-builtbetter-sqlite3/sqlite-vecnative modules. - Darwin Metal: llama-using commands (
query,vsearch,embed) no longer
dump a multi-kB GGML/Metal backtrace at process exit even when output
succeeded. The libggml-metal staticggml_metal_devicedestructor asserts
[rsets->data count] == 0during__cxa_finalize_ranges, but the
buffer-free path never calls the symmetricggml_metal_device_rsets_rm
to remove released rsets from the device collection (upstream
ggml-org/llama.cpp#22593, one-line fix open as PR #22595). The assertion
only fires whenprocess.exit()skips Node'sbeforeExithook, which is
what node-llama-cpp uses to auto-dispose Metal contexts. Primary fix:
finishSuccessfulCliCommandnow setsprocess.exitCode = 0and returns
instead of callingprocess.exit(0), sobeforeExitfires and the native
binding cleans up before libc's static destructor runs. Defense-in-depth:
the launcher (bin/qmd) and the npm test driver (scripts/test-all.mjs- the
test:bun/test:unitpackage.json scripts) also set
GGML_METAL_NO_RESIDENCY=1on darwin before spawning node/bun, covering
error paths and tests that still terminate viaprocess.exit(). The env
var must be set before node/bun start — libggml-metal reads it via libc
getenvat module-load time, and Bun does not propagateprocess.env
mutations to libcsetenv— so it lives in the launcher rather than in
test-preload. Residency sets give no measurable speedup for QMD's
short-lived CLI workflow (benchmarked on M3 Pro). Opt back in with
QMD_METAL_KEEP_RESIDENCY=1for long-lived qmd processes (e.g. the MCP
daemon may benefit on hot reload) or to triage the upstream fix.
qmd doctorreports the mitigation state. Minimal reproduction:
scripts/repro-metal-rsets-crash.mjs.
- the
Docs
- qmd skill: emphasize reading line ranges with
get's built-in
:from:countsuffix /--from/-lflags instead of piping through
sed/head/tail; cite the docid and line numbers now present in retrieval
output; and author structuredintent:/lex:/vec:/hyde:queries yourself
rather than relying on built-in query expansion.
[2.5.2] - 2026-05-22
Fixes
- Launcher: Rewrite
bin/qmdas a Node-based shebang polyglot to fix global npm installation execution failures on Windows (#668 / #452), while supporting seamless fallback to Bun in Node-less environments.
[2.5.1] - 2026-05-20
Changes
- Release: publish from GitHub Actions via npm Trusted Publishing/OIDC instead of a long-lived
NPM_TOKENsecret.
[2.5.0] - 2026-05-19
Changes
- Dependencies: update core SQLite/config/chunking packages (
better-sqlite3,yaml,web-tree-sitter,tree-sitter-go, andtree-sitter-python) while keeping incompatiblezod,tsx, andvitestmajors pinned. - Agent skills: add
qmd skills list|get|pathto serve version-matched runtime skill instructions from the installed CLI, and makeqmd skill installwrite a stable discovery stub so installed agent skills do not go stale after QMD upgrades. - CLI: add
qmd doctorfor index/runtime diagnostics, including SQLite/sqlite-vec versions, embedding fingerprint freshness, mixed-fingerprint detection, safe legacy fingerprint adoption, and content-hash sampling.
Fixes
-
Launcher: prefer runnable TypeScript source in git checkouts even when ignored
dist/artifacts exist, while packaged installs continue to rundist/. -
GPU: keep node-llama-cpp's documented
gpu: "auto"initialization as the primary path, then perform no-build packaged CUDA/Vulkan/Metal probes only if auto falls back to CPU. -
CLI: move GPU/CPU runtime diagnostics out of
qmd status; useqmd doctorfor device probing and related environment guidance. -
CLI: point unexpected command/setup failures toward
qmd doctorso diagnostics are the default next step when QMD behaves incorrectly. -
Doctor: explicitly warn when
content_vectorscontains multiple non-empty embedding fingerprint names, with the per-fingerprint document/chunk breakdown. -
Embed: make the TTY progress line label byte-based input progress explicitly, show embedded chunks as a count, and shorten the displayed model name.
-
Embed: retain per-chunk failure details, retry failed chunks after later successful embeds and again when no other chunks remain, clear recovered errors, and cap retries to avoid endless loops.
-
Tests: expand the container smoke harness to cover npm-global, npx-style, and Bun-global install scenarios, always checking auto and
QMD_FORCE_CPU=1doctor modes, with opt-in tinyqmd embedand GPU probe runs for supported container runtimes. -
Embedding: fingerprint vector metadata using the active embedding model and formatting/chunking parameters so stale vectors are treated as pending after search semantics change. Legacy
content_vectorscolumns are migrated lazily on first vector-health/write use to preserve fast QMD startup. -
Skill: expand the packaged QMD skill with retrieval-first workflows, structured query examples, wiki/source collection guidance, and safe fallbacks when model-backed search is unavailable.
-
Tests: make
bun run testexecute the local unit suite under both Node/Vitest and Bun (test:node+test:bun) so runtime-specific regressions are caught before CI. -
Model config: centralize embedding/rerank/generation model resolution so
qmd embed,status,query,vsearch,pull, SDK vector search, andbenchuse the same active.qmd/index.yamlmodel hints and environment fallbacks. -
GPU/status:
qmd statusnow uses the same embedding model identity asqmd embedwhen computing pending embeddings, so URI-backed embeddings are not incorrectly reported as pending under the legacyembeddinggemmaalias. -
GPU status:
qmd statusnow always shows GPU mode/configuration without unsafe native probing, and CPU-fallback warnings point toQMD_STATUS_DEVICE_PROBE=1 qmd statusfor an actual backend probe. The no-GPU warning is emitted once per process instead of once per LLM instance during benchmarks. -
GPU: add
QMD_FORCE_CPU=1/--no-gputo bypass CUDA/Vulkan/Metal probing entirely, and route native llama.cpp stdout noise to stderr so JSON output stays parseable during search/query commands. -
Snippet line numbers:
qmd_query(MCP), HTTP/query, andqmd query
(CLI JSON output and snippet headers) now return absolute source-file
line numbers instead of chunk-local ones, so thelinefield can be
passed back toqmd_getasfromLinewithout a separate lookup.
Snippet selection remains scoped to the best matching chunk
(preserves #149). -
CLI:
qmd query --fullnow emits the full document body in all output
formats (json, csv, md, xml), restoring the documented behavior of the
flag. Previously it returned only the best matching chunk (~3.6KB max
per result). Output payload for--fullqueries is now proportional
to total document size. -
macOS Metal:
qmd query --jsonnow flushes successful JSON output and uses a safe immediate-exit path on Darwin to avoid ggml Metal finalizer aborts; other commands still dispose LLM contexts/models before the llama runtime. #368 -
Embedding: require complete chunk coverage before treating a document as
embedded, remove partial vectors when chunk/session failures leave a
document incomplete, and keepqmd statuspending counts honest after
interrupted long embed runs. #637 #378 -
Embedding:
qmd embed -c <collection>now scopes pending-doc selection
to the requested collection instead of embedding global pending work.
Scoped--forceclears only collection-owned vectors, preserves shared
hashes referenced by sibling collections, and dropsvectors_veconly
when the scoped clear empties all vectors. -
Hybrid search: weight RRF lists by query type so original FTS and original vector evidence get the intended 2x boost, instead of accidentally boosting the first lexical expansion. #591
-
MCP: seed llama.cpp/GGML quiet env vars before launching
qmd mcpso native logs cannot pollute stdio JSON-RPC framing. #593 -
CLI: remove CommonJS
require()calls from ESM index path normalization soqmd --index <path>no longer crashes withERR_AMBIGUOUS_MODULE_SYNTAXon Node 22+. #634 -
Windows CUDA: serialize llama.cpp embedding/reranking contexts by default to avoid intermittent
ggml-cuda.cu:98crashes inqmd query; setQMD_EMBED_PARALLELISMto opt back into parallel contexts if your driver is stable. #519 -
MCP: make
qmd mcp --index <name>use the selected index for both foreground and daemon HTTP servers instead of falling back to the default store. #343 -
Embedding: respect
QMD_EMBED_MODELconsistently for vector indexing and vector-backed search, with default-model fallback when unset. -
Config: use one home-directory resolver for YAML config and the default SQLite cache path, avoiding Windows CLI/MCP split-brain when
HOMEis unset. -
GPU: respect explicit
QMD_LLAMA_GPU=metal|vulkan|cudabackend overrides instead of always using auto GPU selection. #529 -
Fix: preserve original filename case in
handelize(). The previous
.toLowerCase()call made indexed paths unreachable on case-sensitive
filesystems (Linux).qmd updateautomatically migrates legacy
lowercase paths without re-embedding. -
CLI: make
qmd statusskip nativenode-llama-cppdevice probing by
default so status stays safe on machines with broken or unsupported GPU
drivers. SetQMD_STATUS_DEVICE_PROBE=1to opt in. -
CLI: lazy-load
node-llama-cppso lightweight commands such as
qmd statusdo not import native ML dependencies or trigger llama.cpp
builds on ARM/no-GPU machines. #491 -
Store: keep content rows referenced by inactive documents during orphan
cleanup soqmd updatepreserves soft-deleted tombstones for removed
files. #585 -
Packaging: install AST grammar WASM packages as required dependencies so
Bun global installs include TypeScript/TSX/JavaScript grammars, and add a
smoke:package-grammarsverification command. #595 -
Launcher: add wrapper smoke coverage for scoped package, npm/npx,
Homebrew/Linuxbrew, Bun global symlink layouts, and$BUN_INSTALL
false-positive runtime selection regressions. #351 #353 #354 #356 #358 #359