github gkamradt/needle-in-a-haystack v2.0.0
v2.0.0 — Clean refactor

10 hours ago

v2.0.0 — Clean refactor

A from-scratch rewrite. The v1 CLI (needlehaystack.run_test) is gone;
the new entry point is niah. Existing v1 result files preserved in
original_results/.

Highlights

  • New niah CLI: run, validate, reconstruct
  • Two-YAML config model: one model spec + one run spec
    (replaces v1's 25-flag invocation)
  • New tasks: single, multi, uuid, uuid_chain (multi-hop UUID
    chain for testing multi-step long-context reasoning)
  • Modular extension points via small Protocols + registries — add
    a task, provider, haystack, or scorer without touching the runner
  • JSONL result store with recipe-based reconstruction: rebuild the
    exact context a model saw without storing megabytes per row
  • uv-based packaging, Python 3.12+, 207-test suite with FakeProvider
    (zero API keys needed to develop)
  • Bug fix: multi-needle depth-percent reporting

Install

pip install needlehaystack==2.0.0
# or: uv add needlehaystack==2.0.0

Upgrade from v1

# before:
needlehaystack.run_test --provider openai --model_name gpt-4o ...

# after:
niah run configs/runs/single_needle.example.yaml

See the README quick-start for the YAML config shape.

Don't miss a new needle-in-a-haystack release

NewReleases is sending notifications on new releases.