gpustack/gpustack v2.0.1 on GitHub

Enhancements

Optimize the onboarding and installation process. Server no longer includes an embedded worker by default. We recommend running the server and workers separately for easier operation and to avoid exposing internal server ports. This change only affects new installations; legacy setups retain their behavior after upgrade. See issue #3529.
Improved installation UX, including:
- Better error message when docker.sock is missing. See issue #3517.
- Prevent users from using the API port directly. See issue #3519.
- Added health checks and improved error messages for the gateway. See issues #3525, #3548.
- Support re-registering workers with legacy worker tokens. See issue #3528.
- Moved default Higress config from /opt/data to /etc to avoid errors when /opt is mounted. See issue #3585.
- Relaxed Ray port range constraint. See issue #3620.
Support space-separated patterns in backend parameters, aligning the UX with vLLM/SGLang. See issue #2961.
copy-images command now supports syncing images to an HTTP registry. See issue #3479.
Support configuring container registry authentication for downloading runner images. See issue #3662.
Improved DigitalOcean OS Image selection. See issue #3665.
Support image_url in Playground UI. See issue #3627.

Bug Fixes

Fixed inability to use CP and DP together in MindIE. See issue #3495.
Fixed installation reporting an unsupported OS error. See issue #3499.
Fixed embedded worker potentially failing to register because the server API is not ready. See issue #3503.
Fixed several GPU detection issues. See issues #3510, #3511, #3514, #3590.
Fixed model deployment in Kubernetes returning a Forbidden error. See issue #3513.
Fixed "connect call failed" errors when starting the server. See issue #3535.
Fixed model deployment with a local path getting stuck in "preparing files". See issue #3544.
Fixed worker becoming unready after running for a period of time in some setups. See issue #2631.
Fixed OIDC login with Azure AD no longer working. See issue #3560.
Fixed embedded worker failing to start when migrating from v0.7.1. See issue #2762.
Fixed migration failure when server and worker mounted the same data directory. See issue #3613.
Fixed auto-scheduling running DeepSeek-V3.2 with TP7 and failing. See issue #3640.
Fixed GPU devices API returning "cannot get array length of a scalar" error. See issue #3637.
Fixed manual scheduling failing to distribute GPUs across multiple replicas. See issue #3648.
Fixed vLLM distributed inference in Kubernetes resulting in a duplicate volume name error. See issue #3672.
Fixed model deployment failure when using a MySQL database. See issue #3682.
Fixed worker registration failing when hosts in Kubernetes have identical hostnames. See issue #3700.
Fixed installation failure in WSL environment. See issue #3549.
Fixed server failing to start if data directory is set during upgrade. See issue #3474.
Fixed incorrect storage path shown when downloading a model to a custom directory. See issue #3413.
Fixed download unable to resume automatically after restarting GPUStack during download. See issue #1621.

Others

Updated built-in inference backend versions:
- vLLM:
  - NVIDIA CUDA 12.6/12.8/12.9 → v0.11.2
  - AMD ROCm 7.0/6.4 → v0.11.2
- SGLang:
  - NVIDIA CUDA 12.8 → v0.5.5.post3
  - AMD ROCm 6.4 → v0.5.5.post3
- MindIE:
  - Ascend CANN 8.2 A3(910C)/A2(910B)/310P → 2.2.rc1