bentoml/BentoML v0.9.0 on GitHub

What's New

TLDR;

New input/output adapter design that let's user choose between batch or non-batch implementation
Speed up the API model server docker image build time
Changed the recommended import path of artifact classes, now artifact classes should be imported from bentoml.frameworks.*
Improved python pip package management
Huggingface/Transformers support!!
Managed packaged models with Labels API
Support GCS(Google Cloud Storage) as model storage backend in YataiService
Current Roadmap for feedback: #1128

New Input/Output adapter design

A massive refactoring on BentoML's inference API and input/output adapter redesign, lead by @bojiang with help from @akainth015.

BREAKING CHANGE: API definition now requires declaring if it is a batch API or non-batch API:

from typings import List
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.types import JsonSerializable  # type annotations are optional

@env(infer_pip_packages=True)
@artifacts([SklearnModelArtifact('classifier')])
class MyPredictionService(BentoService):

        @api(input=JsonInput(), batch=True)
        def predict_batch(self, parsed_json_list: List[JsonSerializable]):
            results = self.artifacts.classifier([j['text'] for j in parsed_json_list])
            return results

        @api(input=JsonInput())  # default batch=False
        def predict_non_batch(self, parsed_json: JsonSerializable):
            results = self.artifacts.classifier([parsed_json['text']])
            return results[0]

For APIs with batch=True, the user-defined API function will be required to process a list of input item at a time, and return a list of results of the same length. On the contrary, @api by default uses batch=False, which processes one input item at a time. Implementing a batch API allow your workload to benefit from BentoML's adaptive micro-batching mechanism when serving online traffic, and also will speed up offline batch inference job. We recommend using batch=True if performance & throughput is a concern. Non-batch APIs are usually easier to implement, good for quick POC, simple use cases, and deploying on Serverless platforms such as AWS Lambda, Azure function, and Google KNative.

Read more about this change and example usage here: https://docs.bentoml.org/en/latest/api/adapters.html

BREAKING CHANGE: For `DataframeInput` and `TfTensorInput` users, it is now required to add `batch=True`

DataframeInput and TfTensorInput are special input types that only support accepting a batch of input at one time.

Input data validation while handling batch input

When the API function received a list of input, it is now possible to reject a subset of the input data and return an error code to the client, if the input data is invalid or malformated. Users can do this via the InferenceTask#discard API, here's an example:

from typings import List
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.types import JsonSerializable, InferenceTask  # type annotations are optional

@env(infer_pip_packages=True)
@artifacts([SklearnModelArtifact('classifier')])
class MyPredictionService(BentoService):

        @api(input=JsonInput(), batch=True)
        def predict_batch(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]):
             model_input = []
             for json, task in zip(parsed_json_list, tasks):
                  if "text" in json:
                      model_input.append(json['text'])
                  else:
                      task.discard(http_status=400, err_msg="input json must contain `text` field")

            results = self.artifacts.classifier(model_input)

            return results

The number of tasks got discarded plus the length of the results array returned, should be equal to the length of the input list, this will allow BentoML to match the results back to tasks that have not yet been discarded.

Allow fine-grained control of the HTTP response, CLI inference job output, etc. E.g.:

import bentoml
from bentoml.types import JsonSerializable, InferenceTask, InferenceError  # type annotations are optional

class MyService(bentoml.BentoService):

    @bentoml.api(input=JsonInput(), batch=False)
    def predict(self, parsed_json: JsonSerializable, task: InferenceTask) -> InferenceResult:
        if task.http_headers['Accept'] == "application/json":
            predictions = self.artifact.model.predict([parsed_json])
            return InferenceResult(
                data=predictions[0],
                http_status=200,
                http_headers={"Content-Type": "application/json"},
            )
        else:
            return InferenceError(err_msg="application/json output only", http_status=400)

Or when batch=True:

import bentoml
from bentoml.types import JsonSerializable, InferenceTask, InferenceError  # type annotations are optional

class MyService(bentoml.BentoService):

    @bentoml.api(input=JsonInput(), batch=True)
    def predict(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]) -> List[InferenceResult]:
        rv = []
        predictions = self.artifact.model.predict(parsed_json_list)
        for task, prediction in zip(tasks, predictions):
            if task.http_headers['Accept'] == "application/json":
                rv.append(
                    InferenceResult(
                        data=prediction,
                        http_status=200,
                        http_headers={"Content-Type": "application/json"},
                ))
            else:
                rv.append(InferenceError(err_msg="application/json output only", http_status=400))
                # or task.discard(err_msg="application/json output only", http_status=400)
        return rv

Other adapter changes:

Added a 3 base adapters for implementing advanced adapters: FileInput, StringInput, MultiFileInput
Implementing new adapters that support micro-batching is a lot easier now: https://github.com/bentoml/BentoML/blob/v0.9.0.pre/bentoml/adapters/base_input.py
Per inference task prediction log #1089
More adapters support launching batch inference job from BentoML CLI run command now, see API reference for detailed examples: https://docs.bentoml.org/en/latest/api/adapters.html

Docker Build Improvements

Optimize docker image build time (#1081) kudos to @ZeyadYasser!!
Per python minor version base image to speed up image building #1101 #1096, thanks @gregd33!!
Add "latest" tag to all user-facing docker base images (#1046)

Improved pip package management

Setting pip install options in BentoService `@env` specification

As suggested here: #1036 (comment), Thanks @danield137 for suggesting the pip_extra_index_url option!

@env(
  auto_pip_dependencies=True,
  pip_index_url='my_pypi_host_url',
  pip_trusted_host='my_pypi_host_url',
  pip_extra_index_url='extra_pypi_index_url'
)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):
  ...

BREAKING CHANGE Due to this change, we have now removed the previous docker build arg PIP_INDEX_URL and ARG PIP_TRUSTED_HOST, due to it may be conflicting with settings in base image #1036

Support passing a conda environment.yml file to @env, as suggested in #725 #725
When a version is not specified in pip_packages list, it is expected to pin to the version found in the current python session. Now it is doing the same for packages added from adapter and artifact classes
Support specifying package requirement range now, e.g.:

@env(pip_packages=["abc==1.3", "foo>1.2,<=1.4"])

It can be any pip version requirement specifier https://pip.pypa.io/en/stable/reference/pip_install/#requirement-specifiers

Renamed pip_dependencies to pip_packages and auto_pip_dependencies to infer_pip_packages, the old API still works but will eventually be deprecated.

GCS support in YataiService

Adding Google Cloud Storage (GCS) support in YataiService, as the storage backend. This is an alternative to AWS S3, MiniIO, or POSIX file system. #1017 - Thank you @Korusuke @PrabhanshuAttri for creating the GCS support!

YataiService Labels API for model management

Managed packaged models in YataiService with labels API implemented in #1064

Add labels to BentoService.save

    svc = MyBentoService()
    svc.save(labels={'my_key': 'my_value', 'test': 'passed'})

Add label query for CLI commands

bentoml get BENTO_NAME, bentoml list, bentoml deployment list, bentoml lambda list, bentoml sagemaker list, bentoml azure-functions list
label query supports =, !=, In, NotIn, Exists, DoesNotExists operator
- e.g. key1=value1, key2!=value2, env In (prod, staging), Key Exists, Another_key DoesNotExist