itigges22/ATLAS v3.0.1 on GitHub

V3.0.1 ships ATLAS as an interactive coding assistant you can download and run today. Type atlas in any project directory and start building — powered by a local 9B model on your own GPU. No API keys, no cloud, no data leaves the machine.

What's New

Interactive CLI with Tool-Call Agent Loop

The biggest change in V3.0.1: ATLAS is no longer just a benchmark runner. It's a full interactive coding assistant.

atlas command — starts all services and drops you into an Aider-powered coding session
Grammar-constrained agent loop — the model emits structured JSON tool calls (write_file, edit_file, run_command, etc.) with llama-server's response_format:json_object guaranteeing 100% valid output
8 tools: read_file, write_file, edit_file, delete_file, run_command, search_files, list_directory, plan_tasks
Per-file V3 routing — config files and short files write instantly (T1), feature files with complex logic automatically route through the full V3 pipeline (T2) for diverse candidate generation, build verification, and energy-based selection
Real-time streaming — every tool call, V3 pipeline stage, and build verification visible in the terminal as it happens
95.8% reliability across 8 difficulty levels (24 test iterations)

Docker Compose Deployment

Five commands to a working system:

git clone https://github.com/itigges22/ATLAS.git && cd ATLAS
wget https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q6_K.gguf -O models/Qwen3.5-9B-Q6_K.gguf
pip install -e .
cp .env.example .env
docker compose up -d
atlas

5 containerized services: llama-server (CUDA), geometric-lens (C(x)/G(x) scoring), v3-service (pipeline), sandbox (8-language code execution), atlas-proxy (agent loop). All orchestrated with health checks and dependency ordering.

Documentation Overhaul

Every documentation file rewritten from scratch and verified against source code:

ARCHITECTURE.md — 13 Mermaid diagrams including sequence diagrams showing actual HTTP calls between services
API.md — every endpoint across all 5 services with verified request/response formats
CLI.md — streaming output format, workflow examples, troubleshooting guide
CONFIGURATION.md — every environment variable verified against source code
MAP.md — every file in the repo with clickable links and descriptions
SETUP.md — Docker Compose, bare metal, and K3s deployment guides
TROUBLESHOOTING.md — 20+ issue scenarios with verified fixes

Bug Fixes

Geometric Lens Dockerfile port mismatch — container listened on 8001 but docker-compose expected 8099. Fresh Docker Compose deploys had a broken Lens service. Fixed.
Python CLI default RAG port — atlas/cli/client.py defaulted to port 31144 (K3s) instead of 8099 (Docker Compose). Fixed.
Missing Aider config files — .aider.model.settings.yml and .aider.model.metadata.json were not in the repo. The atlas launcher would fail without them. Restored.
hostname -I fails on Arch Linux (#6) — replaced with portable fallback chain: ip addr -> hostname -I -> hostname -i
rag-api/models/ does not exist (#10) — resolved by V3.0.1 restructuring (rag-api/ -> geometric-lens/)
No Lens weight documentation (#11) — added training docs to SETUP.md + HuggingFace dataset link
docker image exists not a real command (#12, PR #13 by @g0dnerd) — fixed to docker image inspect

V3 Pipeline (unchanged from V3.0)

The same pipeline that scored 74.6% LiveCodeBench pass@1-v(k=3) on frozen Qwen3-14B is now integrated into the interactive CLI:

Phase 0: Probe with progressive retry (light -> standard -> /nothink)
Phase 1: PlanSearch (3 plans) + DivSampling (12 perturbations) + Budget Forcing (5 tiers)
Phase 2: Build verification + C(x)/G(x) scoring + S* tiebreaking
Phase 3: PR-CoT repair + Refinement Loop + Derivation Chains

Hardware Requirements

Resource	Minimum
GPU VRAM	16 GB (NVIDIA, CUDA)
System RAM	14 GB
Disk	20 GB
OS	Linux (RHEL, Ubuntu, Arch, Debian)

Tested on RTX 5060 Ti 16GB. See SETUP.md for detailed instructions.

Benchmark Results (V3.0, Qwen3-14B)

Benchmark	Score	Tasks
LiveCodeBench v5	74.6% pass@1-v(k=3)	599
GPQA Diamond	47.0%	198
SciCode	14.7% (sub-problems)	341

The CLI currently runs Qwen3.5-9B with the same V3 pipeline. Formal benchmarks on the 9B model are V3.1 work.

Full ablation data: v3_ablation_results/ | Traces: HuggingFace

Contributors

Thanks to @g0dnerd for PR #13 fixing the Docker image existence check.

Thanks to @aaronetz (#6), @nguyenhoangthuan99 (#10), and @namp (#11) for reporting issues that made ATLAS better.

itigges22/ATLAS v3.0.1 V3.0.1 — Interactive CLI + Documentation Overhaul on GitHub