We are happy to present the new 2.46.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.46.0, check out the detailed release notes.
Highlights
- Java SDK containers migrated to Eclipse Temurin
as a base. This change migrates away from the deprecated OpenJDK
container. Eclipse Temurin is currently based upon Ubuntu 22.04 while the OpenJDK
container was based upon Debian 11. - RunInference PTransform will accept model paths as SideInputs in Python SDK. (#24042)
- RunInference supports ONNX runtime in Python SDK (#22972)
- Tensorflow Model Handler for RunInference in Python SDK (#25366)
- Java SDK modules migrated to use
:sdks:java:extensions:avro
(#24748)
I/Os
- Added in JmsIO a retry policy for failed publications (Java) (#24971).
- Support for
LZMA
compression/decompression of text files added to the Python SDK (#25316) - Added ReadFrom/WriteTo Csv/Json as top-level transforms to the Python SDK.
New Features / Improvements
- Add UDF metrics support for Samza portable mode.
- Option for SparkRunner to avoid the need of SDF output to fit in memory (#23852).
This helps e.g. with ParquetIO reads. Turn the feature on by adding experimentuse_bounded_concurrent_output_for_sdf
. - Add
WatchFilePattern
transform, which can be used as a side input to the RunInference PTransfrom to watch for model updates using a file pattern. (#24042) - Add support for loading TorchScript models with
PytorchModelHandler
. The TorchScript model path can be
passed to PytorchModelHandler usingtorch_script_model_path=<path_to_model>
. (#25321) - The Go SDK now requires Go 1.19 to build. (#25545)
- The Go SDK now has an initial native Go implementation of a portable Beam Runner called Prism. (#24789)
- For more details and current state see https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/runners/prism.
Breaking Changes
- The deprecated SparkRunner for Spark 2 (see 2.41.0) was removed (#25263).
- Python's BatchElements performs more aggressive batching in some cases,
capping at 10 second rather than 1 second batches by default and excluding
fixed cost in this computation to better handle cases where the fixed cost
is larger than a single second. To get the old behavior, one can pass
target_batch_duration_secs_including_fixed_cost=1
to BatchElements.
Deprecations
- Avro related classes are deprecated in module
beam-sdks-java-core
and will be eventually removed. Please, migrate to a new modulebeam-sdks-java-extensions-avro
instead by importing the classes fromorg.apache.beam.sdk.extensions.avro
package.
For the sake of migration simplicity, the relative package path and the whole class hierarchy of Avro related classes in new module is preserved the same as it was before.
For example, importorg.apache.beam.sdk.extensions.avro.coders.AvroCoder
class instead oforg.apache.beam.sdk.coders.AvroCoder
. (#24749).
List of Contributors
According to git shortlog, the following people contributed to the 2.46.0 release. Thank you to all contributors!
Ahmet Altay
Alan Zhang
Alexey Romanenko
Amrane Ait Zeouay
Anand Inguva
Andrew Pilloud
Brian Hulette
Bruno Volpato
Byron Ellis
Chamikara Jayalath
Damon
Danny McCormick
Darkhan Nausharipov
David Katz
Dmitry Repin
Doug Judd
Egbert van der Wal
Elizaveta Lomteva
Evan Galpin
Herman Mak
Jack McCluskey
Jan Lukavský
Johanna Öjeling
John Casey
Jozef Vilcek
Junhao Liu
Juta Staes
Katie Liu
Kiley Sok
Liam Miller-Cushon
Luke Cwik
Moritz Mack
Ning Kang
Oleh Borysevych
Pablo E
Pablo Estrada
Reuven Lax
Ritesh Ghorse
Robert Bradshaw
Robert Burke
Ruslan Altynnikov
Ryan Zhang
Sam Rohde
Sam Whittle
Sam sam
Sergei Lilichenko
Shivam
Shubham Krishna
Theodore Ni
Timur Sultanov
Tony Tang
Vachan
Veronica Wasson
Vincent Devillers
Vitaly Terentyev
William Ross Morrow
Xinyu Liu
Yi Hu
ZhengLin Li
Ziqi Ma
ahmedabu98
alexeyinkin
aliftadvantage
bullet03
dannikay
darshan-sj
dependabot[bot]
johnjcasey
kamrankoupayi
kileys
liferoad
nancyxu123
nickuncaged1201
pablo rodriguez defino
tvalentyn
xqhu