github gpustack/gpustack 0.2.0

latest release: 0.2.1
4 days ago

Release Notes

Enhancements

  • Distributed Inference: Introduced support for distributed inference. See issues #117, #118.
  • Legacy GPU Support: Added support for Nvidia GPUs with Compute Capability below 8.0. See issue #216.
  • CPU Inference: Implemented support for CPU-based inference. See issue #141.
  • Sharded Model Files: Added support for sharded model files. See issue #190.
  • Enhanced Scheduling: Improved scheduling capabilities, including binpack and spread strategies, scheduling to specific GPUs, and worker label-based scheduling. See issue #99.
  • Replica Scaling: Added functionality to scale model replicas from the model list page. See issue #210.

Bug Fixes

  • Model Deployment Issue: Resolved an issue where model instances could not be deployed after updating worker names. See issue #191.
  • GPU Detection: Fixed GPU detection problems when nvidia-drm module was not loaded. See issue #212.
  • Custom System Reserved Parameter: Addressed failures when using custom --system-reserved parameters. See issue #152.
  • Ollama Models Download: Fixed an issue that prevented non-library Ollama models from being downloaded correctly. See issue #230.
  • GPU Index Assignment: Corrected the assignment of incorrect GPU indices to model instances. See issue #221.
  • GLIBC Version Not Found Error: Fixed a bug where model deployment would fail due to missing GLIBC version errors. See issue #270.

Don't miss a new gpustack release

NewReleases is sending notifications on new releases.