🍱 BentoML v1.0.13
is released featuring a preview of batch inference with Spark.
-
Run the batch inference job using the
bentoml.batch.run_in_spark()
method. This method takes the API name, the Spark DataFrame containing the input data, and the Spark session itself as parameters, and it returns a DataFrame containing the results of the batch inference job.import bentoml # Import the bento from a repository or get the bento from the bento store bento = bentoml.import_bento("s3://bentoml/quickstart") # Run the run_in_spark function with the bento, API name, and Spark session results_df = bentoml.batch.run_in_spark(bento, "classify", df, spark)
-
Internally, what happens when you run
run_in_spark
is as follows:- First, the bento is distributed to the cluster. Note that if the bento has already been distributed, i.e. you have already run a computation with that bento, this step is skipped.
- Next, a process function is created, which starts a BentoML server on each of the Spark workers, then uses a client to process all the data. This is done so that the workers take advantage of the batch processing features of the BentoML server. PySpark pickles this process function and dispatches it, along with the relevant data, to the workers.
- Finally, the function is evaluated on the given dataframe. Once all methods that the user defined in the script have been executed, the data is returned to the master node.
⚠️ The bentoml.batch
API may undergo incompatible changes until general availability announced in a later minor version release.
🥂 Shout out to jeffthebear, KimSoungRyoul, Robert Fernandez, Marco Vela, Quan Nguyen, and y1450 from the community for their contributions in this release.
What's Changed
- docs: add inline notes and better exception by @bojiang in #3296
- chore(deps): bump pytest-asyncio from 0.20.2 to 0.20.3 by @dependabot in #3334
- feat: bentoserver client by @qu8n in #3321
- fix(transformers): check for task aliases by @jeffthebear in #3337
- chore(framework): add partial_kwargs to picklable and pytorch runners by @bojiang in #3338
- feat: protobuf shim by @aarnphm in #3333
- fix: CI breakage by @aarnphm in #3350
- chore(deps): bump black[jupyter] from 22.10.0 to 22.12.0 by @dependabot in #3354
- chore(deps): bump isort from 5.10.1 to 5.11.1 by @dependabot in #3355
- feat(http server): pass-through openapi of mounted apps by @bojiang in #3358
- fix(pytorch): runnable method collision by @bojiang in #3357
- fix(torchscript): runnable method collision by @bojiang in #3364
- chore(deps): bump isort from 5.11.1 to 5.11.2 by @dependabot in #3361
- chore(deps): bump isort from 5.11.2 to 5.11.3 in /requirements by @dependabot in #3374
- chore(deps): bump bufbuild/buf-setup-action from 1.9.0 to 1.10.0 by @dependabot in #3370
- chore(deps): bump coverage[toml] from 6.5.0 to 7.0.0 in /requirements by @dependabot in #3373
- chore(deps): bump pylint from 2.15.8 to 2.15.9 in /requirements by @dependabot in #3372
- chore(deps): bump imageio from 2.22.4 to 2.23.0 in /requirements by @dependabot in #3371
- fix: make sure to handle relative path for templates by @aarnphm in #3375
- fix(containerize): fs path format on windows by @bojiang in #3378
- chore(deps): bump isort from 5.11.3 to 5.11.4 by @dependabot in #3380
- docs: tracing and configuration by @aarnphm in #3067
- fix: use relative urls in swagger UI by @sauyon in #3381
- chore(deps): bump bufbuild/buf-setup-action from 1.10.0 to 1.11.0 by @dependabot in #3382
- chore(deps): bump coverage[toml] from 7.0.0 to 7.0.1 by @dependabot in #3383
- chore(config): ignore blank lines in bentoml config options by @bojiang in #3385
- chore(deps): bump coverage[toml] from 7.0.1 to 7.0.2 by @dependabot in #3386
- fix: log error when runnable instantiation fails by @sauyon in #3388
- chore(deps): bump coverage[toml] from 7.0.2 to 7.0.3 by @dependabot in #3390
- fix: don't use logger for CLI output by @sauyon in #3395
- fix: allow passing server URLs with paths by @sauyon in #3394
- fix(sdk): handling container platform from CLI separately by @aarnphm in #3366
- fix: wrong self annotations by @aarnphm in #3397
- chore(deps): bump imageio from 2.23.0 to 2.24.0 by @dependabot in #3410
- chore(deps): bump coverage[toml] from 7.0.3 to 7.0.4 by @dependabot in #3409
- chore(deps): bump pylint from 2.15.9 to 2.15.10 by @dependabot in #3407
- fix: serve missing logic from #3321 by @aarnphm in #3336
- chore(deps): bump coverage[toml] from 7.0.4 to 7.0.5 by @dependabot in #3413
- chore(deps): bump yamllint from 1.28.0 to 1.29.0 by @dependabot in #3414
- fix: regression f-string by @aarnphm in #3416
- fix(runner): log correct error types during model validation by @characat0 in #3421
- fix(client): make sure tags is available in specs by @KimSoungRyoul in #3359
- fix: handling KeyError when accessing IODescriptor spec by @aarnphm in #3398
- chore(deps): bump build[virtualenv] from 0.9.0 to 0.10.0 by @dependabot in #3419
- feat: support bentos and tags in bentoml.bentos.serve by @sauyon in #3424
- feat: add endpoints list to client by @sauyon in #3423
- fix: #3399 during
containerize
by @aarnphm in #3400 - feat: add context manager support for
bentoml.client
by @y1450 in #3402 - chore: migrate to newer API in docstring by @KimSoungRyoul in #3429
- chore(deps): bump bufbuild/buf-setup-action from 1.11.0 to 1.12.0 by @dependabot in #3430
- chore(deps): bump pytest from 7.2.0 to 7.2.1 by @dependabot in #3433
- feat: openapi_components method for Multipart by @RobbieFernandez in #3438
- ci: disable 3.10 e2e for gRPC on Mac X86 by @aarnphm in #3441
- chore(exportable): update exception message and errors imports by @aarnphm in #3435
- feat: make
load_bento
take Tag and Bento by @sauyon in #3444 - chore: add setuptools-scm as dev deps by @aarnphm in #3443
- fix: load_bento Tag import by @sauyon in #3445
- feat: support batch inference with Spark by @sauyon in #3425
- chore: add pandas-stubs as dev-dependencies by @aarnphm in #3442
- fix: raise more specific error in
from_spec
by @sauyon in #3447 - fix(cli): overriding memoized options via
--opt
by @aarnphm in #3401 - fix(exception): wrong variable reference by @aarnphm in #3450
- fix: make sure to run migration for envvar by @aarnphm in #3339
- feat: YataiClient context to communicate with multiple Yatai instances by @ssheng in #3448
New Contributors
- @characat0 made their first contribution in #3421
- @y1450 made their first contribution in #3402
- @RobbieFernandez made their first contribution in #3438
Full Changelog: v1.0.12...v1.0.13