What's changed
- Bump CUDA base image version to 12.3.2
- Add
cdi-cri
device list strategy. This uses the CDIDevices CRI field to request CDI devices instead of annotations. - Set MPS memory limit by device index and not device UUID. This is a workaround for an issue where
these limits are not applied for devices if set by UUID. - Update MPS sharing to disallow requests for multiple devices if MPS sharing is configured.
- Set mps device memory limit by index.
- Explicitly set sharing.mps.failRequestsGreaterThanOne = true.
- Run tail -f for each MPS daemon to output logs.
- Enforce replica limits for MPS sharing.