v2.0.0 — Clean refactor

A from-scratch rewrite. The v1 CLI (needlehaystack.run_test) is gone;
the new entry point is niah. Existing v1 result files preserved in
original_results/.

Highlights

New niah CLI: run, validate, reconstruct
Two-YAML config model: one model spec + one run spec
(replaces v1's 25-flag invocation)
New tasks: single, multi, uuid, uuid_chain (multi-hop UUID
chain for testing multi-step long-context reasoning)
Modular extension points via small Protocols + registries — add
a task, provider, haystack, or scorer without touching the runner
JSONL result store with recipe-based reconstruction: rebuild the
exact context a model saw without storing megabytes per row
uv-based packaging, Python 3.12+, 207-test suite with FakeProvider
(zero API keys needed to develop)
Bug fix: multi-needle depth-percent reporting

Install

pip install needlehaystack==2.0.0
# or: uv add needlehaystack==2.0.0

Upgrade from v1

# before:
needlehaystack.run_test --provider openai --model_name gpt-4o ...

# after:
niah run configs/runs/single_needle.example.yaml

See the README quick-start for the YAML config shape.

gkamradt/needle-in-a-haystack v2.0.0 v2.0.0 — Clean refactor on GitHub

v2.0.0 — Clean refactor

Highlights

Install

Upgrade from v1

gkamradt/needle-in-a-haystack v2.0.0
v2.0.0 — Clean refactor

on GitHub