tobi/qmd v1.0.5 on GitHub

[1.0.5] - 2026-02-16

The npm package now ships compiled JavaScript instead of raw TypeScript,
removing the tsx runtime dependency. A new /release skill automates the
full release workflow with changelog validation and git hook enforcement.

Changes

Build: compile TypeScript to dist/ via tsc so the npm package no longer
requires tsx at runtime. The qmd shell wrapper now runs dist/qmd.js
directly.
Release tooling: new /release skill that manages the full release
lifecycle — validates changelog, installs git hooks, previews release notes,
and cuts the release. Auto-populates [Unreleased] from git history when
empty.
Release tooling: scripts/extract-changelog.sh extracts cumulative notes
for the full minor series (e.g. 1.0.0 through 1.0.5) for GitHub releases.
Includes [Unreleased] content in previews.
Release tooling: scripts/release.sh renames [Unreleased] to a versioned
heading and inserts a fresh empty [Unreleased] section automatically.
Release tooling: pre-push git hook blocks v* tag pushes unless
package.json version matches the tag, a changelog entry exists, and CI
passed on GitHub.
Publish workflow: GitHub Actions now builds TypeScript, creates a GitHub
release with cumulative notes extracted from the changelog, and publishes
to npm with provenance.

[1.0.0] - 2026-02-15

QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
through parallel GPU contexts. GPU auto-detection replaces the unreliable
gpu: "auto" with explicit CUDA/Metal/Vulkan probing.

Changes

Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite
abstraction layer (src/db.ts). bun:sqlite on Bun, better-sqlite3 on
Node. The qmd wrapper auto-detects a suitable Node.js install via PATH,
then falls back to mise, asdf, nvm, and Homebrew locations.
Performance: parallel embedding & reranking via multiple LlamaContext
instances — up to 2.7x faster on multi-core machines.
Performance: flash attention for ~20% less VRAM per reranking context,
enabling more parallel contexts on GPU.
Performance: right-sized reranker context (40960 → 2048 tokens, 17x less
memory) since chunks are capped at ~900 tokens.
Performance: adaptive parallelism — context count computed from available
VRAM (GPU) or CPU math cores rather than hardcoded.
GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of
relying on node-llama-cpp's gpu: "auto". qmd status shows device info.
Tests: reorganized into flat test/ directory with vitest for Node.js and
bun test for Bun. New eval-bm25 and store.helpers.unit suites.

Fixes

Prevent VRAM waste from duplicate context creation during concurrent
embedBatch calls — initialization lock now covers the full path.
Collection-aware FTS filtering so scoped keyword search actually restricts
results to the requested collection.