🍱 BentoML v1.0.19
is released with enhanced GPU utilization and expanded ML framework support.
-
Optimized GPU resource utilization: Enabled scheduling of multiple instances of the same runner using the
workers_per_resource
scheduling strategy configuration. The following configuration allows scheduling 2 instances of the “iris” runner per GPU instance.workers_per_resource
is 1 by default.runners: iris: resources: nvidia.com/gpu: 1 workers_per_resource: 2
-
New ML framework support: We've added support for EasyOCR and Detectron2 to our growing list of supported ML frameworks.
-
Enhanced runner communication: Implemented PEP 574 out-of-band pickling to improve runner communication by eliminating memory copying, resulting in better performance and efficiency.
-
Backward compatibility for Hugging Face Transformers: Resolved compatibility issues with Hugging Face Transformers versions prior to
v4.18
, ensuring a seamless experience for users with older versions.
⚙️ With the release of Kubeflow 1.7, BentoML now has native integration with Kubeflow, allowing developers to leverage BentoML's cloud-native components. Prior, developers were limited to exporting and deploying Bento
as a single container. With this integration, models trained in Kubeflow can easily be packaged, containerized, and deployed to a Kubernetes cluster as microservices. This architecture enables the individual models to run in their own pods, utilizing the most optimal hardware for their respective tasks and enabling independent scaling.
💡 With each release, we consistently update our blog, documentation and examples to empower the community in harnessing the full potential of BentoML.
- Learn more scheduling strategy to get better resource utilization.
- Learn more about model monitoring and drift detection in BentoML and integration with various monitoring framework.
- Learn more about using Nvidia Triton Inference Server as a runner to improve your application’s performance and throughput.
What's Changed
- fix(env): using
python -m
to run pip commands by @frostming in #3762 - chore(deps): bump pytest from 7.3.0 to 7.3.1 by @dependabot in #3766
- feat: lazy load
bentoml.server
by @aarnphm in #3763 - fix(client): service route prefix by @aarnphm in #3765
- chore: add test with many requests by @sauyon in #3768
- fix: using http config for grpc server by @aarnphm in #3771
- feat: apply pep574 out-of-band pickling to DefaultContainer by @larme in #3736
- fix: passing serve_cmd and passthrough kwargs by @aarnphm in #3764
- feat: Detectron by @aarnphm in #3711
- chore(dispatcher): (re-)factor out training code by @sauyon in #3767
- feat: EasyOCR by @aarnphm in #3712
- feat(build): support 3.11 by @aarnphm in #3774
- patch: backports module availability for transformers<4.18 by @aarnphm in #3775
- fix(dispatcher): set wait to 0 while training by @sauyon in #3664
- chore(deps): bump ruff from 0.0.261 to 0.0.262 by @dependabot in #3778
- feat: add
model#load_model
method by @parano in #3780 - feat: Allow spawning more than 1 worker on each resource by @frostming in #3776
- docs: Fix TensorFlow
save_model
parameter order by @ssheng in #3781 - chore(deps): bump yamllint from 1.30.0 to 1.31.0 by @dependabot in #3782
- chore(deps): bump imageio from 2.27.0 to 2.28.0 by @dependabot in #3783
- chore(deps): bump ruff from 0.0.262 to 0.0.263 by @dependabot in #3790
- fix: allow import service defined under a Python package by @parano in #3794
New Contributors
- @frostming made their first contribution in #3762
Full Changelog: v1.0.18...v1.0.19