github itigges22/ATLAS v3.0.1
V3.0.1 — Interactive CLI + Documentation Overhaul

7 hours ago

ATLAS Banner

V3.0.1 ships ATLAS as an interactive coding assistant you can download and run today. Type atlas in any project directory and start building — powered by a local 9B model on your own GPU. No API keys, no cloud, no data leaves the machine.

ATLAS CLI


What's New

Interactive CLI with Tool-Call Agent Loop

The biggest change in V3.0.1: ATLAS is no longer just a benchmark runner. It's a full interactive coding assistant.

  • atlas command — starts all services and drops you into an Aider-powered coding session
  • Grammar-constrained agent loop — the model emits structured JSON tool calls (write_file, edit_file, run_command, etc.) with llama-server's response_format:json_object guaranteeing 100% valid output
  • 8 tools: read_file, write_file, edit_file, delete_file, run_command, search_files, list_directory, plan_tasks
  • Per-file V3 routing — config files and short files write instantly (T1), feature files with complex logic automatically route through the full V3 pipeline (T2) for diverse candidate generation, build verification, and energy-based selection
  • Real-time streaming — every tool call, V3 pipeline stage, and build verification visible in the terminal as it happens
  • 95.8% reliability across 8 difficulty levels (24 test iterations)

Docker Compose Deployment

Five commands to a working system:

git clone https://github.com/itigges22/ATLAS.git && cd ATLAS
wget https://huggingface.co/unsloth/Qwen3.5-9B-GGUF/resolve/main/Qwen3.5-9B-Q6_K.gguf -O models/Qwen3.5-9B-Q6_K.gguf
pip install -e .
cp .env.example .env
docker compose up -d
atlas

5 containerized services: llama-server (CUDA), geometric-lens (C(x)/G(x) scoring), v3-service (pipeline), sandbox (8-language code execution), atlas-proxy (agent loop). All orchestrated with health checks and dependency ordering.

Documentation Overhaul

Every documentation file rewritten from scratch and verified against source code:

  • ARCHITECTURE.md — 13 Mermaid diagrams including sequence diagrams showing actual HTTP calls between services
  • API.md — every endpoint across all 5 services with verified request/response formats
  • CLI.md — streaming output format, workflow examples, troubleshooting guide
  • CONFIGURATION.md — every environment variable verified against source code
  • MAP.md — every file in the repo with clickable links and descriptions
  • SETUP.md — Docker Compose, bare metal, and K3s deployment guides
  • TROUBLESHOOTING.md — 20+ issue scenarios with verified fixes

Bug Fixes

  • Geometric Lens Dockerfile port mismatch — container listened on 8001 but docker-compose expected 8099. Fresh Docker Compose deploys had a broken Lens service. Fixed.
  • Python CLI default RAG portatlas/cli/client.py defaulted to port 31144 (K3s) instead of 8099 (Docker Compose). Fixed.
  • Missing Aider config files.aider.model.settings.yml and .aider.model.metadata.json were not in the repo. The atlas launcher would fail without them. Restored.
  • hostname -I fails on Arch Linux (#6) — replaced with portable fallback chain: ip addr -> hostname -I -> hostname -i
  • rag-api/models/ does not exist (#10) — resolved by V3.0.1 restructuring (rag-api/ -> geometric-lens/)
  • No Lens weight documentation (#11) — added training docs to SETUP.md + HuggingFace dataset link
  • docker image exists not a real command (#12, PR #13 by @g0dnerd) — fixed to docker image inspect

V3 Pipeline (unchanged from V3.0)

The same pipeline that scored 74.6% LiveCodeBench pass@1-v(k=3) on frozen Qwen3-14B is now integrated into the interactive CLI:

  • Phase 0: Probe with progressive retry (light -> standard -> /nothink)
  • Phase 1: PlanSearch (3 plans) + DivSampling (12 perturbations) + Budget Forcing (5 tiers)
  • Phase 2: Build verification + C(x)/G(x) scoring + S* tiebreaking
  • Phase 3: PR-CoT repair + Refinement Loop + Derivation Chains

Hardware Requirements

Resource Minimum
GPU VRAM 16 GB (NVIDIA, CUDA)
System RAM 14 GB
Disk 20 GB
OS Linux (RHEL, Ubuntu, Arch, Debian)

Tested on RTX 5060 Ti 16GB. See SETUP.md for detailed instructions.


Benchmark Results (V3.0, Qwen3-14B)

Benchmark Score Tasks
LiveCodeBench v5 74.6% pass@1-v(k=3) 599
GPQA Diamond 47.0% 198
SciCode 14.7% (sub-problems) 341

The CLI currently runs Qwen3.5-9B with the same V3 pipeline. Formal benchmarks on the 9B model are V3.1 work.

Full ablation data: v3_ablation_results/ | Traces: HuggingFace


Contributors

Thanks to @g0dnerd for PR #13 fixing the Docker image existence check.

Thanks to @aaronetz (#6), @nguyenhoangthuan99 (#10), and @namp (#11) for reporting issues that made ATLAS better.

Don't miss a new ATLAS release

NewReleases is sending notifications on new releases.