github tobi/qmd v1.0.5

latest release: v1.0.6
8 hours ago

[1.0.5] - 2026-02-16

The npm package now ships compiled JavaScript instead of raw TypeScript,
removing the tsx runtime dependency. A new /release skill automates the
full release workflow with changelog validation and git hook enforcement.

Changes

  • Build: compile TypeScript to dist/ via tsc so the npm package no longer
    requires tsx at runtime. The qmd shell wrapper now runs dist/qmd.js
    directly.
  • Release tooling: new /release skill that manages the full release
    lifecycle — validates changelog, installs git hooks, previews release notes,
    and cuts the release. Auto-populates [Unreleased] from git history when
    empty.
  • Release tooling: scripts/extract-changelog.sh extracts cumulative notes
    for the full minor series (e.g. 1.0.0 through 1.0.5) for GitHub releases.
    Includes [Unreleased] content in previews.
  • Release tooling: scripts/release.sh renames [Unreleased] to a versioned
    heading and inserts a fresh empty [Unreleased] section automatically.
  • Release tooling: pre-push git hook blocks v* tag pushes unless
    package.json version matches the tag, a changelog entry exists, and CI
    passed on GitHub.
  • Publish workflow: GitHub Actions now builds TypeScript, creates a GitHub
    release with cumulative notes extracted from the changelog, and publishes
    to npm with provenance.

[1.0.0] - 2026-02-15

QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
through parallel GPU contexts. GPU auto-detection replaces the unreliable
gpu: "auto" with explicit CUDA/Metal/Vulkan probing.

Changes

  • Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite
    abstraction layer (src/db.ts). bun:sqlite on Bun, better-sqlite3 on
    Node. The qmd wrapper auto-detects a suitable Node.js install via PATH,
    then falls back to mise, asdf, nvm, and Homebrew locations.
  • Performance: parallel embedding & reranking via multiple LlamaContext
    instances — up to 2.7x faster on multi-core machines.
  • Performance: flash attention for ~20% less VRAM per reranking context,
    enabling more parallel contexts on GPU.
  • Performance: right-sized reranker context (40960 → 2048 tokens, 17x less
    memory) since chunks are capped at ~900 tokens.
  • Performance: adaptive parallelism — context count computed from available
    VRAM (GPU) or CPU math cores rather than hardcoded.
  • GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of
    relying on node-llama-cpp's gpu: "auto". qmd status shows device info.
  • Tests: reorganized into flat test/ directory with vitest for Node.js and
    bun test for Bun. New eval-bm25 and store.helpers.unit suites.

Fixes

  • Prevent VRAM waste from duplicate context creation during concurrent
    embedBatch calls — initialization lock now covers the full path.
  • Collection-aware FTS filtering so scoped keyword search actually restricts
    results to the requested collection.

Don't miss a new qmd release

NewReleases is sending notifications on new releases.