EXO v1.0.68 Release Notes

This is the biggest EXO release to date. We wanted to make sure we address the stability issues users were running into on previous versions and we think we've achieved that with this release. This release also comes with a whole load of new features and UX improvements, full list below. Thank you to everyone who submitted bug reports over the past few weeks - it helps us to improve EXO much faster.

Models

Add support for custom models from Huggingface (#1368)
Add support for Qwen3-Coder-Next (#1367)
Add support for Step 3.5 Flash (#1460)
Add support for GLM 5 (#1526), (#1529)
Add support for MiniMax M2.5 (#1514)

API

Add support for Claude Messages API, enabling tools like Claude Code (#1167)
Add support for OpenAI Responses API (#1167)
Add usage and generation stats to API, enabling clients like OpenCode to consume stats including prompt tokens, completion tokens and total tokens (#1333), (#1461)
Cancel text generation when API request is closed (#1276)
Add support for Ollama API (#1560)

Web Dashboard

Add redesigned model picker modal (#1369), (#1377), (#1440), (#1470)
Display alternative tokens / logprobs visualizer in chat responses (#1180)
Redesign downloads page as model x node table (#1465), (#1589), (#1581)
Add prefill progress bar for long prompts (#1181), (#1557)
A new onboarding flow when running EXO for the first time (#1533)
Automatic model selection / model recommendations in web dashboard (#1590)

Quality of Life

Show a more informative message in macOS app when installing network location (#1309)
Migrate model cards to .toml files (#1354)
Clean up exo gracefully on shutdown, preventing memory not being cleaned up on exit (#1388)
Make topology updates more responsive by yielding from reachability checks instead of waiting for all checks (#1427)
Allow typing in chat input while response is generating (#1433)
Add log rotation, now exo logs get written to ~/.exo/exo_logs (#1438), (#1439), (#1442)
Distinguish between model fits in available memory and fits in total memory in model picker (#1441 h/t @Hmbown)
Add enable_thinking toggle for models that support thinking/non-thinking (#1457)
Show a warning in the web dashboard when macOS versions of nodes in a cluster are incompatible (#1436)
Show macOS version in debug mode on web dashboard
Add cancellation button and cancel during prefill (#1540), (#1575)
Strip Claude headers to improve prefix cache hit rates (#1552)
Prioritise Thunderbolt for Ring (TCP/IP) instances (#1556)
Show paused downloads with completion % in web dashboard (#1564)
Add support for loading models from arbitrary paths with EXO_MODELS_PATH environment variable (#1574)

Image Generation (Experimental)

Add support for non-streaming image generation (#1328)
Add support for parallel classifier-free guidance (CFG) for Qwen image models, embarrassingly parallel image generation with improved performance (#1361)
Add more image dimensions for image generation (1024x1365 and 1365x1024) (#1395)
Add support for FLUX.1-Kontext-dev, an image editing variant of FLUX.1-dev (#1394)

Bug Fixes

Fix GPU locks, caused by MLX_METAL_FAST_SYNCH. This would often cause nodes to be stuck at 100% GPU utilization where the only fix is to reboot the machine (#1429), (#1489), (#1515)
Fix Pipeline instances crashes when MLX_METAL_FAST_SYNCH was enabled (previously only RDMA, now also Ring instances) (#1620), (#1622))
Fix ConfigData validation for kimi-k2, fixing loading issues (#1314)
Fix uninstall button not working in some cases (#1306)
Load pipeline layers sequentially, preventing instances getting stuck in LOADING (#1329)
Make node ids unique per session, fixing double identities when users copy their entire ~/.exo folder to another node (#1338)
Skip duplicate tasks in worker, preventing tasks from getting processed twice causing instances to get stuck sometimes (#1342), (#1381)
Fix out of sync issues with prefix caching on multiple nodes (#1341)
Remove mDNS discovered peers from appearing in state, fixing issues with broken one-sided connections (#1312)
Create config home when checking for config file, fixing crashes on Linux systems (#1353)
Fix tool calling edge case when max_tokens truncates generation mid-tool-call, causing request to hang (#1344)
Fix Kimi tool calling id, fix GPT-OSS tool calling bug (#1413), (#1487), (#1529)
Ensure EXO works with no internet (#1363), (#1402)
Cancel downloads for deleted instances, preventing downloads from continuing silently in the background (#1393)
Retry transient download errors (#1398)
Move event log from unbounded in-memory list to disk, fixing a memory leak (#1432)
Never save to app directory, preventing local network access permission from being silently lost requiring users to manually toggle this in System Settings (#1435)
Fix tensor parallel sharding for MiniMax, Qwen3Next, Qwen3MoE and GLM4MoE models (#1318), (#1411), (#1595), (#1604)
Slow down catchup, preventing too large message being sent over Gossipsub for long-running clusters. This speeds up and improves stability when nodes join or re-join long running clusters (#1407)
Fix setrlimit crash when hard file descriptor limit < 65535 (#1430 h/t @mustafalpyilmaz)
Fix RDMA debug info in debug mode on web dashboard (#1437)
Prevent DownloadModel event flood, fixing issues with event logs growing rapidly, causing cluster to get stuck (#1452)
Don't time out node identifies, preventing nodes getting stuck as a hexagon in the topology after disconnecting and rejoining a cluster with the same node id (#1493)
Ensure memory gets released on shut down, prevent crashes on instance shutdown (#1555)
Turn on MLX_METAL_FAST_SYNCH for Ring instances, speeding up generation for large prompts with Ring instances (#1594)
Prevent runners silently dying and detect GPU timeouts and out of memory errors (#1592)

Security

Make trust_remote_code opt-in (default false) for custom models loaded from Huggingface (#1603 h/t @kevthehermit)

Full Changelog: v1.0.67...v1.0.68

exo-explore/exo v1.0.68 on GitHub