Overview
Core 2.10.1 is a patch release fixing a significant partial scheduling bug that existed since 2.9.0 but started being visible to users from 2.10.0.
We also eliminate a set of scenarios where the scheduler experienced a slow-start due to waiting for connections from server replicas that were never created, due to configuration errors (i.e updating a ServerConfig
after Servers
with .spec.replicas > 0
referencing that config have been deployed as StatefulSets)
Bugfix details:
Starting in 2.10.0, after pods of an inference server hosting a model were restarted (irrespective of the reason), the model ended up scheduled only on approximately model.spec.minReplicas
server replicas rather than the requested (and expected) model.spec.replicas
. The variation in the actual model replicas being scheduled was dependent on the timing/sequencing of server replica connection to the scheduler after restart.
This regression appeared because an existing bug in the partial scheduling logic (there since 2.9.0) started manifesting itself consistently after fixing a data race bug (not directly related to partial scheduling) in 2.10. Before, the data race bug was difficult to trigger under most cluster operation scenarios, so was not experienced by users.
In 2.10.1, we fix the underlying bug so that partial scheduling works as expected.
Upgrading from previous Core 2 versions
No CRD or configuration changes are introduced in this patch release, but if upgrading from a version previous to 2.10.0, you should first read the 2.10.0 release notes
Changelog
Generated by auto-changelog
.
v2.10.1
20 October 2025