We are happy to present the new 2.42.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.
For more information on changes in 2.42.0, check out the detailed release notes.
Highlights
- Added support for stateful DoFns to the Go SDK.
New Features / Improvements
- Added support for Zstd compression to the Python SDK.
- Added support for Google Cloud Profiler to the Go SDK.
- Added support for stateful DoFns to the Go SDK.
Breaking Changes
- The Go SDK's Row Coder now uses a different single-precision float encoding for float32 types to match Java's behavior (#22629).
Bugfixes
- Fixed Python cross-language JDBC IO Connector cannot read or write rows containing Timestamp type values 19817.
Known Issues
- Go SDK doesn't yet support Slowly Changing Side Input pattern (#23106)
- See a full list of open issues that affect this version.
What's Changed
- Remove stripping of step name and replace with substring search by @AnandInguva in #22415
- [Website] Remove beam-summit 2022 by @bullet03 in #22444
- Add read/write PubSub integration example fhirio pipeline by @lnogueir in #22306
- [Go SDK]: Remove deprecated Session runner by @jrmccluskey in #22505
- Add Go test status to the PR template by @jrmccluskey in #22508
- Fix typo in Datastore V1ReadIT test by @yixiaoshen in #22484
- Remove unnecessary reference to use_runner_v2 experiment in x-lang examples and documentation by @chamikaramj in #22376
- Relax the google-api-core dependency. by @tvalentyn in #22513
- Bump google.golang.org/protobuf from 1.28.0 to 1.28.1 in /sdks by @dependabot in #22517
- Bump google.golang.org/api from 0.89.0 to 0.90.0 in /sdks by @dependabot in #22518
- Change _build import from setuptools to distutils by @AnandInguva in #22503
- Remove stringx package by @damccorm in #22534
- Improve concrete error message by @damccorm in #22536
- Exclude grpcio==1.48.0 by @tvalentyn in #22539
- Fix JDBCIOIT by @Abacn in #22304
- Update pytest to support Python 3.10 by @AnandInguva in #22055
- Update the imprecise link. by @tvalentyn in #22549
- Remove normalization in Pytorch Image Segmentation example by @yeandy in #22371
- Downgrade less informative logs during write to files by @Abacn in #22273
- Add zstd compression/decompression support by @grufino in #22419
- Beam ml notebooks by @AnandInguva in #22510
- [Go SDK]: Add clearer error message for xlang transforms on the Go Direct Runner by @jrmccluskey in #22562
- [CdapIO] Add integration tests for CdapIO (Batch) by @Amar3tto in #22313
- Bugfix: Fix broken assertion in PipelineTest by @mosche in #22485
- Mention Java RunInference support in the Website by @chamikaramj in #22557
- Update run_inference_basic.ipynb by @AnandInguva in #22567
- Update CHANGE.md after 2.41.0 cut by @Abacn in #22577
- Convert to BeamSchema type from ReadfromBQ by @svetakvsundhar in #17159
- Fix deleteTimer in InMemoryTimerInternals and enable VR tests for GroupIntoBatches. by @mosche in #22525
- Update Dataflow container version by @yeandy in #22580
- [22188]Set allowed timestamp skew by @reuvenlax in #22347
- Added experimental annotation to fixes #22564 by @ryanthompson591 in #22565
- [BEAM-14117, #21519] Delete vendored bytebuddy gradle build by @lukecwik in #22594
- Add Import transform to Go FhirIO by @lnogueir in #22460
- Moving misplaced CHANGES from template to 2.41.0 by @Abacn in #22581
- Allow unsafe triggers for python nexmark benchmarks by @y1chi in #22596
- pubsublite: Fix max offset for computing backlog by @dpcollins-google in #22585
- Add support when writing to locked buckets by handling retentionPolicyNotMet error by @ahmedabu98 in #22138
- [BEAM-14118, #21639] Vendor gRPC 1.48.1 by @lukecwik in #22607
- [21894] Validates inference_args early by @ryanthompson591 in #22282
- Return type for _ExpandIntoRanges DoFn should be Iterable. by @jonathanasdf in #22548
- Add PyDoc buttons to the top and bottom of the Machine Learning page by @rszper in #22458
- [Playground]: Modified WithKeys Playground Example by @VladMatyunin in #22326
- [Playground][Backend][Bug]: Moving the initialization of properties file by @vchunikhin in #22310
- [Playground] Remove Beam Summit banner from Playground by @miamihotline in #22410
- Bump cloud.google.com/go/bigquery from 1.36.0 to 1.37.0 in /sdks by @dependabot in #22598
- Minor: Clean up an assertion in schemas_test by @TheNeuralBit in #22613
- Exclude testWithShardedKeyInGlobalWindow on streaming runner v1 by @TheNeuralBit in #22593
- Add an example for
Distinct
PTransform by @shhivam in #22417 - Pub/Sub Schema Transform Read Provider by @damondouglas in #22145
- Update BigQuery URI validation to allow more valid URIs through by @TheMichaelHu in #22452
- Fix bug in StructUtils of SpannerIO by @manitgupta in #22429
- Add units tests for SpannerIO by @manitgupta in #22428
- Bump google.golang.org/api from 0.90.0 to 0.91.0 in /sdks by @dependabot in #22568
- Fix for #22631 KafkaIO considers readCommitted() as it would commit back the offsets, which it doesn't by @nbali in #22633
- [CdapIO] Add CdapIO dashboard in Grafana by @Amar3tto in #22641
- Fix retaining unsaved pipeline options (#22075) by @alexeyinkin in #22098
- Add information on how to take/close issues in the contribution guide. by @damccorm in #22640
- Removed VladMatyunin from beam collaborators by @olehborysevych in #22634
- Skip dataflow_exercise_metrics_pipeline_test.ExerciseMetricsPipelineTest.test_metrics_it by @yeandy in #22623
- Add stdlib distutils env variable while building the wheels by @AnandInguva in #22635
- Persist ghprbPullId parameter in seed job by @TheNeuralBit in #22579
- Adhoc: Fix logging in Spark runner to avoid unnecessary creation of strings by @mosche in #22638
- Improve exception when requested error tag does not exist (#22401) by @bvolpato in #22405
- Reimplement Pub/Sub Lite's I/O using UnboundedSource. by @dpcollins-google in #22612
- [Website] update contribution content collapse by @bullet03 in #22468
- Clean up checkstyle suppressions.xml by @Abacn in #22649
- [Playground] [Infrastructure] Uniform code style for python scripts by @vchunikhin in #22291
- Minor: Add helpful names for parameterized tests in
dataframe.schemas_test
by @TheNeuralBit in #22630 - [BEAM-14118, fixes #21639] Use vendored gRPC 1.48.1 by @lukecwik in #22628
- Change Python PostCommits timeout by @yeandy in #22655
- Revert "Persist ghprbPullId parameter in seed job" by @damccorm in #22656
- Bump actions/setup-java from 2 to 3 by @dependabot in #22666
- Bump actions/labeler from 3 to 4 by @dependabot in #22670
- Bump actions/setup-node from 2 to 3 by @dependabot in #22671
- Bump actions/setup-go from 2 to 3 by @dependabot in #22669
- Bump actions/setup-python from 2 to 4 by @dependabot in #22668
- Bump actions/checkout from 2 to 3 by @dependabot in #22667
- Fix broken link to Retry Policy blog by @nikhilnadig28 in #22554
- Include total in header of issue report by @kennknowles in #22475
- [Playground] Share any code feature frontend by @alexeyinkin in #22477
- Update vendored gRPC version for SpannerTransformRegistrarTest by @chamikaramj in #22677
- Remove subprocess.PIPE usage by using a temp file by @chamikaramj in #22654
- [#22647] Upgrade org.apache.samza to 1.6 by @kw2542 in #22648
- Fix seed job by @damccorm in #22687
- Bump actions/stale from 3 to 5 by @dependabot in #22684
- Bump actions/upload-artifact from 2 to 3 by @dependabot in #22682
- Bump actions/download-artifact from 2 to 3 by @dependabot in #22683
- Add shunts for Beam typehints to
apache_beam.dataframe.schemas
by @TheNeuralBit in #22680 - Fix wordcount setup-java by @damccorm in #22700
- Bump google.golang.org/api from 0.91.0 to 0.92.0 in /sdks by @dependabot in #22681
- Bump cloud.google.com/go/storage from 1.24.0 to 1.25.0 in /sdks by @dependabot in #22705
- [Website] update runners table content overflow by @bullet03 in #22470
- Bump mongo_java_driver to 3.12.11 and embed.mongo to 3.0.0 by @Abacn in #22674
- [Go SDK]: Implement standalone single-precision float encoder by @jrmccluskey in #22664
- [Playground] [Backend] added validation for snippet endpoints to avoid error panic by @vchunikhin in #22686
- Add GeneratedClassRowTypeConstraint by @TheNeuralBit in #22679
- [Playground] [Backend] Removing unused snippets manually and using the scheduled task by @vchunikhin in #22389
- Implement PubsubSchemaTransformWriteConfiguration by @damondouglas in #22262
- Add support for FLOAT to Python RowCoder by @TheNeuralBit in #22626
- Bump up python container versions by @damccorm in #22697
- fix minor unreachable code caused by log.Fatal by @Abirdcfly in #22618
- Attempt to fix SpannerIO test flakes by @johnjcasey in #22688
- Add a dataflow override for runnerv1 to still use SDF on runnerv2. by @dpcollins-google in #22661
- [Playground] Result filter bug by @miamihotline in #22215
- [Website] update case-studies layout by @bullet03 in #22342
- Fix UpdateSchemaDestination when source format is set to AVRO by @steveniemitz in #22390
- Implement KafkaSchemaTransformReadConfiguration by @damondouglas in #22403
- [Go SDK]: Handle single-precision float values in the standard coders tests by @jrmccluskey in #22716
- [BEAM-13015, #21250] Remove looking up thread local metrics container holder and object creation on hot path by @lukecwik in #22627
- [fixes #22731] Publish nightly snapshot of legacy Dataflow worker jar by @lukecwik in #22732
- [fixes #22744] Update hadoop library patch versions to 2.10.2 and 3.2.4 by @lukecwik in #22745
- Update beam-master version for legacy by @lukecwik in #22741
- Remove assert schema.id by @yeandy in #22750
- Bump google.golang.org/api from 0.92.0 to 0.93.0 in /sdks by @dependabot in #22752
- Fix direct running mode multi_processing on win32 by @Abacn in #22730
- Improve error message on schema issues by @pabloem in #22469
- sklearn runinference regression example by @ryanthompson591 in #22088
- [Website] add Intuit case-study by @bullet03 in #22757
- Avoid panic on type assert. by @lostluck in #22767
- [#21935] Reject ill formed GroupByKey coders during pipeline.run validation within Beam Java SDK. by @lukecwik in #22702
- [fixes #22431] Don't use batch interface for single object operations by @steveniemitz in #22432
- Label kata changes with the language they're modifying by @damccorm in #22764
- [Website] Add GitHub issue link in Contribution guide by @hurutoriya in #22774
- Fix some typos in the ML doc by @damccorm in #22763
- Go stateful DoFns user side changes by @damccorm in #22761
- fixed column width in tables in Getting started from Spark guide by @pcoet in #22770
- Testing authentication for Playground by @pabloem in #22782
- Downgrade bytebuddy version to 1.11.0 by @cushon in #22765
- [BEAM-12776, fixes #21095] Limit parallel closes from the prior element to the next element. by @lukecwik in #22645
- [BEAM-13015, #21250] Reuse buffers when possible when writing on Dataflow streaming hot paths. by @lukecwik in #22780
- [Website] update Available contact channels table content by @bullet03 in #22754
- [Website] update commits link by @bullet03 in #22608
- [Website] scroll to correct position if anchor is present by @bullet03 in #22235
- [Go SDK] Fix go lint errors by @riteshghorse in #22796
- [BEAM-8701] bump commons-io to 2.7 by @masahitojp in #22433
- Modify RunInference to return PipelineResult for the benchmark tests by @AnandInguva in #22164
- Fix lint issues by @damccorm in #22800
- Bump cloud.google.com/go/bigquery from 1.37.0 to 1.38.0 in /sdks by @dependabot in #22734
- Add Release category to release announcement blogs by @Abacn in #22785
- [BEAM-13657] Update Python version used by mypy. by @tvalentyn in #22804
- Align neo4j error messages with API by @RustedBones in #22812
- Add Python nexmark to gradle by @apilloud in #22801
- E2E basic state support by @damccorm in #22798
- Add state integration test by @damccorm in #22815
- Evaluate proper metric in TextIOIT by @Abacn in #22740
- [Playground] Setup Datastore in Playground project using Terraform - change main.tf file by @MakarkinSAkvelon in #22506
- Update Beam 2.41.0 release docs by @kileys in #22706
- Add bag state support by @damccorm in #22816
- [BEAM-10496] Eliminate some null errors from sdks/java/core by @kennknowles in #17819
- Fix dates for 2.41.0 release by @kileys in #22830
- Added link to setup instructions in WordCount example by @pcoet in #22832
- Bump google.golang.org/api from 0.93.0 to 0.94.0 in /sdks by @dependabot in #22839
- Bump cloud.google.com/go/bigquery from 1.38.0 to 1.39.0 in /sdks by @dependabot in #22837
- Add an integration test for bag state by @damccorm in #22827
- Fix a few linting issues by @damccorm in #22842
- Updates old releases to use archive.apache.org by @chamikaramj in #22835
- Add combining state support by @damccorm in #22826
- Bump cloud.google.com/go/pubsub from 1.24.0 to 1.25.1 in /sdks by @dependabot in #22850
- Bump google.golang.org/grpc from 1.48.0 to 1.49.0 in /sdks by @dependabot in #22838
- [Website] Update videos section by @bullet03 in #22772
- Update Dataflow fnapi_container_version by @jrmccluskey in #22852
- Go SDK Katas: Update beam module dependency by @damondouglas in #22753
- unskip sklearn IT test by @AnandInguva in #22825
- [Website] add Python implementation to KinesisIO by @bullet03 in #22841
- Combining state integration test by @damccorm in #22846
- Small lint fixes by @damccorm in #22890
- Yield BatchElement batches at end of window. by @robertwb in #22834
- Preserve state on SDK switch (#22430) by @alexeyinkin in #22735
- Update to Byte Buddy 1.12.14 by @cushon in #22814
- Pass user specified destination type to UpdateSchemaDestination by @ahmedabu98 in #22624
- [Go SDK] Stream decode values in single iterations by @lostluck in #22904
- Sharding vortex by @Naireen in #16795
- Update wordcount_minimal.py by removing pipeline_args.extend by @liferoad in #22786
- Add map state in the Go Sdk by @damccorm in #22897
- [BEAM-12164] Feat: Added support to Cloud Spanner Change Streams connector for including transaction tags in the Change Stream records by @dedocibula in #22769
- Add set state in Go by @damccorm in #22919
- Go Map State integration test by @damccorm in #22898
- Add clear function for bag state types by @damccorm in #22917
- [BEAM-22923] Allow sharding specification for dataframe writes. by @robertwb in #22925
- [Playground] Update build_playground_backend.yml - add "Index creation" in backend pipeline by @MakarkinSAkvelon in #22724
- [Playground] [Backend] added SDK validation to save a code snippet by @vchunikhin in #22792
- Fix linting violations by @damccorm in #22934
- [akvelon][tour-of-beam] backend bootstraps by @eantyshev in #22556
- Bump up postcommit timeout by @damccorm in #22937
- Handle stateful windows correctly + integration test by @damccorm in #22918
- Support Timestamp type in xlang JDBC Read and Write by @Abacn in #22561
- Automatically infer state keys from their field name by @damccorm in #22922
- Updates to multi-lang Java quickstart by @chamikaramj in #22927
- Fix yaml duplicated mapping key by @Abacn in #22952
- [Playground] [Infrastructure] Adding the Cloud Datastore client to save playground examples by @vchunikhin in #22721
- Fix jdbc date conversion offset 1 day by @Abacn in #22738
- Set state integration test by @damccorm in #22935
- Minor: Fix
option_from_runner_api
typehint by @TheNeuralBit in #22946 - Filter out unsupported state tests by @damccorm in #22963
- Add ability to remove/clear map and set state by @damccorm in #22938
- Fix gpu to cpu conversion with warning logs by @yeandy in #22795
- Add Go stateful DoFns to CHANGES.md and fix linting violations by @damccorm in #22958
- 22805: Upgrade Jackson version from 2.13.0 to 2.13.3 by @perkss in #22806
- Add BatchConverter implementations for pandas types, use Batched DoFns in DataFrame convert utilities by @TheNeuralBit in #22575
- Run cred rotation every month by @damccorm in #22977
- [BEAM-12164] Synchronize access queue in ThroughputEstimator and reenable integration tests by @nancyxu123 in #22921
- Make RowTypeConstraint callable, test nested optional row in schemas and RowCoder by @TheNeuralBit in #22899
- Add some explanatory comments to the wordcount registration by @damccorm in #22989
- Move Go examples under the cookbook directory to generic registration by @zaneli in #22988
- Improve BQ test utils to support JSON in a more simple manner by @pabloem in #22942
- [fixes #22980] Migrate BeamFnLoggingClient to the new execution state sampler. by @lukecwik in #22981
- Add initial read_gbq wrapper by @svetakvsundhar in #22616
- Update proto generation script due to BEAM-13939. by @robertwb in #22993
- Minor: Fix lint failure by @TheNeuralBit in #22998
- [Tour Of Beam][backend] get unit content by @eantyshev in #22967
- Allows to use databaseio with postgres driver by @wannabehero in #22941
- Bump cloud.google.com/go/storage from 1.25.0 to 1.26.0 in /sdks by @dependabot in #22954
- [BEAM-22859] Allow the specification of extra packages for external Python transforms. by @robertwb in #22991
- [Tour of Beam]: Welcome Screen frontend layout by @nausharipov in #22794
- Unsickbay testEventTimeTimerSetWithinAllowedLateness test by @y1chi in #22861
- Adding support for Beam Schema Rows with BQ DIRECT_READ by @pabloem in #22926
- Add java Bigquery IO known issue to beam 2.40 release blogpost by @johnjcasey in #22611
- [Playground] - Remove authentication step for Example pipeline by @MakarkinSAkvelon in #23015
- Add run-inference component for autolabeling by @damccorm in #22971
- [Playground] [Infrastructure] Deleting the Cloud Storage Client by @vchunikhin in #22722
- Updates Java RunInference to infer Python dependencies when possible by @chamikaramj in #23017
- Adding TensorFlow support to the Machine Learning overview page by @rszper in #22949
- [#19857] Migrate to using a memory aware cache within the Python SDK harness by @lukecwik in #22924
- Generalize interface of InfluxDBPublisher to support more use cases (test-utils) by @mosche in #22260
- Revert "Remove subprocess.PIPE usage by using a temp file" by @chamikaramj in #23013
- Allow users to pass ClassLoader to dynamically load JDBC drivers. by @pranavbhandari24 in #22929
- Fix withCheckStopReadingFn to not cause the pipeline to crash by @johnjcasey in #22962
- Inference benchmark tests by @AnandInguva in #21738
- [Go SDK]: Add support for Google Cloud Profiler for pipelines by @jrmccluskey in #22824
- Listen to window messages to switch SDK and to load content by @alexeyinkin in #22959
- [Playground] [Backend] Missing DB index creation step condition by @olehborysevych in #23037
- Minor: Use typehints in benchmark utilities by @TheNeuralBit in #22943
- Disable singleIterate by @damccorm in #23042
- Allow expansion service to choose pickler. by @robertwb in #22999
- [BEAM-22856] PythonService Beam version compatibility by @ihji in #22982
- Upgrade to Gradle 7.5.1 by @kennknowles in #22479
- Fixes RunInference test failure by @chamikaramj in #23051
- Bump github.com/lib/pq from 1.10.6 to 1.10.7 in /sdks by @dependabot in #23061
- Allowing more flexible precision for TIMESTAMP, DATETIME fields. by @ahmedabu98 in #22559
- Reenable run-inference tests on windows by @damccorm in #23044
- [BEAM-12164] Support new value capture types NEW_ROW NEW_VALUES for s… by @ChangyuLi28 in #23053
- Fix example registration input arity by @damccorm in #23059
- Clarify inference example docs by @damccorm in #23018
- [Playground] [Backend] Datastore queries and mappers to get examples by @vchunikhin in #22955
- BEAM-13468 allow non-lts jvm version by @guillaumecle in #17274
- Keep stale action from closing issues by @damccorm in #23067
- [BEAM-11205] Update Libraries BOM dependencies to version 26.1.1 by @rajatbhatta in #22996
- Use cloudpickle for Java Python transforms. by @robertwb in #23093
- [release-2.42.0] Use existing pickle_library flag in expansion service. by @tvalentyn in #23118
- Revert "[#19857] Migrate to using a memory aware cache within the Python SDK harness (#22924)" by @lukecwik in #23107
- Pin cloudpickle to 2.1.0 in preparation for Beam 2.42.0 release. by @tvalentyn in #23121
- [release-2.42.0] Only close committers after all CheckpointMarks have gone away. by @tvalentyn in #23188
- Fix IllegalStateException in StorageApiWriteUnshardedRecords error ha… by @scwhittle in #23233
- [cherry-pick][release-2.42.0][BEAM-12164]: fix throughput estimator to only report backlog bytes on data records (#23493) by @nancyxu123 in #23530
- [CherryPick][#23062] Fix Java Examples_Flink in 2.42.0 release. by @ahmedabu98 in #23605
New Contributors
- @grufino made their first contribution in #22419
- @jonathanasdf made their first contribution in #22548
- @shhivam made their first contribution in #22417
- @TheMichaelHu made their first contribution in #22452
- @manitgupta made their first contribution in #22429
- @nikhilnadig28 made their first contribution in #22554
- @Abirdcfly made their first contribution in #22618
- @hurutoriya made their first contribution in #22774
- @dedocibula made their first contribution in #22769
- @wannabehero made their first contribution in #22941
- @ChangyuLi28 made their first contribution in #23053
- @guillaumecle made their first contribution in #17274
- @rajatbhatta made their first contribution in #22996
Full Changelog: v2.41.0...v2.42.0