[1.0.5] - 2026-02-16
The npm package now ships compiled JavaScript instead of raw TypeScript,
removing the tsx runtime dependency. A new /release skill automates the
full release workflow with changelog validation and git hook enforcement.
Changes
- Build: compile TypeScript to
dist/viatscso the npm package no longer
requirestsxat runtime. Theqmdshell wrapper now runsdist/qmd.js
directly. - Release tooling: new
/releaseskill that manages the full release
lifecycle — validates changelog, installs git hooks, previews release notes,
and cuts the release. Auto-populates[Unreleased]from git history when
empty. - Release tooling:
scripts/extract-changelog.shextracts cumulative notes
for the full minor series (e.g. 1.0.0 through 1.0.5) for GitHub releases.
Includes[Unreleased]content in previews. - Release tooling:
scripts/release.shrenames[Unreleased]to a versioned
heading and inserts a fresh empty[Unreleased]section automatically. - Release tooling: pre-push git hook blocks
v*tag pushes unless
package.jsonversion matches the tag, a changelog entry exists, and CI
passed on GitHub. - Publish workflow: GitHub Actions now builds TypeScript, creates a GitHub
release with cumulative notes extracted from the changelog, and publishes
to npm with provenance.
[1.0.0] - 2026-02-15
QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
through parallel GPU contexts. GPU auto-detection replaces the unreliable
gpu: "auto" with explicit CUDA/Metal/Vulkan probing.
Changes
- Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite
abstraction layer (src/db.ts).bun:sqliteon Bun,better-sqlite3on
Node. Theqmdwrapper auto-detects a suitable Node.js install via PATH,
then falls back to mise, asdf, nvm, and Homebrew locations. - Performance: parallel embedding & reranking via multiple LlamaContext
instances — up to 2.7x faster on multi-core machines. - Performance: flash attention for ~20% less VRAM per reranking context,
enabling more parallel contexts on GPU. - Performance: right-sized reranker context (40960 → 2048 tokens, 17x less
memory) since chunks are capped at ~900 tokens. - Performance: adaptive parallelism — context count computed from available
VRAM (GPU) or CPU math cores rather than hardcoded. - GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of
relying on node-llama-cpp'sgpu: "auto".qmd statusshows device info. - Tests: reorganized into flat
test/directory with vitest for Node.js and
bun test for Bun. Neweval-bm25andstore.helpers.unitsuites.
Fixes
- Prevent VRAM waste from duplicate context creation during concurrent
embedBatchcalls — initialization lock now covers the full path. - Collection-aware FTS filtering so scoped keyword search actually restricts
results to the requested collection.