EXO v1.0.64 Release Notes
This release comes with support for GLM-4.7-Flash, IP-less RDMA discovery (removing the need for custom network locations) and OpenAI-compatible tool calling via the API. It also includes bug fixes for auto parallelism, fixing various models including Qwen, GPT-OSS and MiniMax that were getting stuck in LOADING / WARMING UP, as well as better error messages when things go wrong.
Model Support
- Added support for GLM-4.7-Flash (#1214)
API
- Added tool calling support to the OpenAI-compatible chat completions API (#1233)
UX
- Add proxy and custom SSL certificate support for corporate networks (#1189)
- More responsive node info by splitting information sources and state (#928, #1209)
- Less spammy logs (#1218, #1225)
- Better error messages when things go wrong (#1198, #1173, #1177)
- Detect Thunderbolt Bridge cycles and show a native prompt to disable Thunderbolt Bridge (#1222)
- Add the ability to filter nodes for placement in the dashboard by clicking them (#1248)
Bug Fixes
- Fix placement edge cases for heterogeneous devices with lopsided memory availabilities (#1200)
- Fix various issues with auto parallel for loading and sharding models (#1202, #1201, #1206, #1211, #1229, #1223)
- Prepend think tags for certain models that include a think tag in their chat template e.g. GLM 4.7 (#1186)
Full Changelog: v1.0.63...v1.0.64