bentoml/BentoML v1.0.8 on GitHub

🍱 BentoML v1.0.8 is released with a list of improvement we hope that you’ll find useful.

Introduced Bento Client for easy access to the BentoML service over HTTP. Both sync and async calls are supported. See the Bento Client Guide for more details.

from bentoml.client import Client

client = Client.from_url("http://localhost:3000")

# Sync call
response = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))

# Async call
response = await client.async_classify(np.array([[4.9, 3.0, 1.4, 0.2]]))

Introduced custom metrics support for easy instrumentation of custom metrics over Prometheus. See Metrics Guide] for more details.

# Histogram metric
inference_duration = bentoml.metrics.Histogram(
    name="inference_duration",
    documentation="Duration of inference",
    labelnames=["nltk_version", "sentiment_cls"],
)

# Counter metric
polarity_counter = bentoml.metrics.Counter(
    name="polarity_total",
    documentation="Count total number of analysis by polarity scores",
    labelnames=["polarity"],
)

Full Prometheus style syntax is supported for instrumenting custom metrics inside API and Runner definitions.

# Histogram
inference_duration.labels(
    nltk_version=nltk.__version__, sentiment_cls=self.sia.__class__.__name__
).observe(time.perf_counter() - start)

# Counter
polarity_counter.labels(polarity=is_positive).inc()

Improved health checking to also cover the status of runners to avoid returning a healthy status before runners are ready.

Added SSL/TLS support to gRPC serving.

bentoml serve-grpc --ssl-certfile=credentials/cert.pem --ssl-keyfile=credentials/key.pem --production --enable-reflection

Added channelz support for easy debugging gRPC serving.

Allowed nested requirements with the -r syntax.

# requirements.txt
-r nested/requirements.txt

pydantic
Pillow
fastapi

Improved the adaptive batching] dispatcher auto-tuning ability to avoid sporadic request failures due to batching in the beginning of the runner lifecycle.

Fixed a bug such that runners will raise a TypeError when overloaded. Now an HTTP 503 Service Unavailable will be returned when runner is overloaded.

File "python3.9/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 188, in async_run_method
    return tuple(AutoContainer.from_payload(payload) for payload in payloads)
TypeError: 'Response' object is not iterable

💡 We continue to update the documentation and examples on every release to help the community unlock the full power of BentoML.

Check out the updated PyTorch Framework Guide on how to use external_modules to save classes or utility functions required by the model.
See the Metrics Guide on how to add custom metrics to your API and custom Runners.
Learn more about how to use the Bento Client to call your BentoML service with Python easily.
Check out the latest blog post on why model serving over gRPC matters to data scientists.

🥂 We’d like to thank the community for your continued support and engagement.

Shout out to @judahrand for multiple contributions to BentoML and bentoctl.
Shout out to @phildamore-phdata, @quandollar, @2JooYeon, and @fortunto2 for their first contribution to BentoML.

bentoml/BentoML v1.0.8 BentoML - v1.0.8 on GitHub

bentoml/BentoML v1.0.8
BentoML - v1.0.8

on GitHub