v2.0.0 — Clean refactor
A from-scratch rewrite. The v1 CLI (needlehaystack.run_test) is gone;
the new entry point is niah. Existing v1 result files preserved in
original_results/.
Highlights
- New
niahCLI:run,validate,reconstruct - Two-YAML config model: one model spec + one run spec
(replaces v1's 25-flag invocation) - New tasks:
single,multi,uuid,uuid_chain(multi-hop UUID
chain for testing multi-step long-context reasoning) - Modular extension points via small Protocols + registries — add
a task, provider, haystack, or scorer without touching the runner - JSONL result store with recipe-based reconstruction: rebuild the
exact context a model saw without storing megabytes per row - uv-based packaging, Python 3.12+, 207-test suite with FakeProvider
(zero API keys needed to develop) - Bug fix: multi-needle depth-percent reporting
Install
pip install needlehaystack==2.0.0
# or: uv add needlehaystack==2.0.0
Upgrade from v1
# before:
needlehaystack.run_test --provider openai --model_name gpt-4o ...
# after:
niah run configs/runs/single_needle.example.yaml
See the README quick-start for the YAML config shape.