V3.0.1 ships ATLAS as an interactive coding assistant you can download and run today. Type atlas in any project directory and start building — powered by a local 9B model on your own GPU. No API keys, no cloud, no data leaves the machine.
What's New
Interactive CLI with Tool-Call Agent Loop
The biggest change in V3.0.1: ATLAS is no longer just a benchmark runner. It's a full interactive coding assistant.
atlascommand — starts all services and drops you into an Aider-powered coding session- Grammar-constrained agent loop — the model emits structured JSON tool calls (
write_file,edit_file,run_command, etc.) with llama-server'sresponse_format:json_objectguaranteeing 100% valid output - 8 tools:
read_file,write_file,edit_file,delete_file,run_command,search_files,list_directory,plan_tasks - Per-file V3 routing — config files and short files write instantly (T1), feature files with complex logic automatically route through the full V3 pipeline (T2) for diverse candidate generation, build verification, and energy-based selection
- Real-time streaming — every tool call, V3 pipeline stage, and build verification visible in the terminal as it happens
- 95.8% reliability across 8 difficulty levels (24 test iterations)
Docker Compose Deployment
Five commands to a working system:
git clone https://github.com/itigges22/ATLAS.git && cd ATLAS
wget https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q6_K.gguf -O models/Qwen3.5-9B-Q6_K.gguf
pip install -e .
cp .env.example .env
docker compose up -d
atlas5 containerized services: llama-server (CUDA), geometric-lens (C(x)/G(x) scoring), v3-service (pipeline), sandbox (8-language code execution), atlas-proxy (agent loop). All orchestrated with health checks and dependency ordering.
Documentation Overhaul
Every documentation file rewritten from scratch and verified against source code:
- ARCHITECTURE.md — 13 Mermaid diagrams including sequence diagrams showing actual HTTP calls between services
- API.md — every endpoint across all 5 services with verified request/response formats
- CLI.md — streaming output format, workflow examples, troubleshooting guide
- CONFIGURATION.md — every environment variable verified against source code
- MAP.md — every file in the repo with clickable links and descriptions
- SETUP.md — Docker Compose, bare metal, and K3s deployment guides
- TROUBLESHOOTING.md — 20+ issue scenarios with verified fixes
Bug Fixes
- Geometric Lens Dockerfile port mismatch — container listened on 8001 but docker-compose expected 8099. Fresh Docker Compose deploys had a broken Lens service. Fixed.
- Python CLI default RAG port —
atlas/cli/client.pydefaulted to port 31144 (K3s) instead of 8099 (Docker Compose). Fixed. - Missing Aider config files —
.aider.model.settings.ymland.aider.model.metadata.jsonwere not in the repo. Theatlaslauncher would fail without them. Restored. hostname -Ifails on Arch Linux (#6) — replaced with portable fallback chain:ip addr->hostname -I->hostname -irag-api/models/does not exist (#10) — resolved by V3.0.1 restructuring (rag-api/->geometric-lens/)- No Lens weight documentation (#11) — added training docs to SETUP.md + HuggingFace dataset link
docker image existsnot a real command (#12, PR #13 by @g0dnerd) — fixed todocker image inspect
V3 Pipeline (unchanged from V3.0)
The same pipeline that scored 74.6% LiveCodeBench pass@1-v(k=3) on frozen Qwen3-14B is now integrated into the interactive CLI:
- Phase 0: Probe with progressive retry (light -> standard -> /nothink)
- Phase 1: PlanSearch (3 plans) + DivSampling (12 perturbations) + Budget Forcing (5 tiers)
- Phase 2: Build verification + C(x)/G(x) scoring + S* tiebreaking
- Phase 3: PR-CoT repair + Refinement Loop + Derivation Chains
Hardware Requirements
| Resource | Minimum |
|---|---|
| GPU VRAM | 16 GB (NVIDIA, CUDA) |
| System RAM | 14 GB |
| Disk | 20 GB |
| OS | Linux (RHEL, Ubuntu, Arch, Debian) |
Tested on RTX 5060 Ti 16GB. See SETUP.md for detailed instructions.
Benchmark Results (V3.0, Qwen3-14B)
| Benchmark | Score | Tasks |
|---|---|---|
| LiveCodeBench v5 | 74.6% pass@1-v(k=3) | 599 |
| GPQA Diamond | 47.0% | 198 |
| SciCode | 14.7% (sub-problems) | 341 |
The CLI currently runs Qwen3.5-9B with the same V3 pipeline. Formal benchmarks on the 9B model are V3.1 work.
Full ablation data: v3_ablation_results/ | Traces: HuggingFace
Contributors
Thanks to @g0dnerd for PR #13 fixing the Docker image existence check.
Thanks to @aaronetz (#6), @nguyenhoangthuan99 (#10), and @namp (#11) for reporting issues that made ATLAS better.

