Overview
-
Support for schema registries, through
bytewax.connectors.kafka.registry.RedpandaSchemaRegistry
andbytewax.connectors.kafka.registry.ConfluentSchemaRegistry
. -
Custom Kafka operators in
bytewax.connectors.kafka.operators
:
input
,output
,deserialize_key
,deserialize_value
,deserialize
,
serialize_key
,serialize_value
andserialize
. -
Breaking change
KafkaSource
now emits a specialKafkaSourceMessage
to allow access to all data on consumed messages.KafkaSink
now consumesKafkaSinkMessage
to allow setting additional fields on produced messages. -
Non-linear dataflows are now possible. Each operator method returns
a handle to theStream
s it produces; add further steps via calling
operator functions on those returned handles, not the root
Dataflow
. See the migration guide for more info. -
Auto-complete and type hinting on operators, inputs, outputs,
streams, and logic functions now works. -
A ton of new operators:
collect_final
,count_final
,
count_window
,flatten
,inspect_debug
,join
,join_named
,
max_final
,max_window
,merge
,min_final
,min_window
,
key_on
,key_assert
,key_split
,merge
,unary
. Documentation
for all operators are inbytewax.operators
now. -
New operators can be added in Python, made by grouping existing
operators. Seebytewax.dataflow
module docstring for more info. -
Breaking change Operators are now stand-alone functions;
import bytewax.operators as op
and use e.g.op.map("step_id", upstream, lambda x: x + 1)
. -
Breaking change All operators must take a
step_id
argument now. -
Breaking change
fold
andreduce
operators have been renamed to
fold_final
andreduce_final
. They now only emit on EOF and are
only for use in batch contexts. -
Breaking change
batch
operator renamed tocollect
, so as to
not be confused with runtime batching. Behavior is unchanged. -
Breaking change
output
operator does not forward downstream its
items. Add operators on the upstream handle instead. -
next_batch
on input partitions can now return anyIterable
, not
just aList
. -
inspect
operator now has a default inspector that prints out items
with the step ID. -
collect_window
operator now can collect intoset
s anddict
s. -
Adds a
get_fs_id
argument to{Dir,File}Source
to allow handling
non-identical files per worker. -
Adds a
TestingSource.EOF
andTestingSource.ABORT
sentinel values
you can use to test recovery. -
Breaking change Adds a
datetime
argument to
FixedPartitionSource.build_part
,DynamicSource.build_part
,
StatefulSourcePartition.next_batch
, and
StatelessSourcePartition.next_batch
. You can now use this to
update yournext_awake
time easily. -
Breaking change Window operators now emit
WindowMetadata
objects
downstream. These objects can be used to introspect the open_time
and close_time of windows. This changes the output type of windowing
operators from:(key, values)
to(key, (metadata, values))
. -
Breaking change IO classes and connectors have been renamed to
better reflect their semantics and match up with documentation. -
Moves the ability to start multiple Python processes with the
-p
or--processes
to thebytewax.testing
module. -
Breaking change
SimplePollingSource
moved from
bytewax.connectors.periodic
tobytewax.inputs
since it is an
input helper. -
SimplePollingSource
'salign_to
argument now works.
What's Changed
- Error cleanups by @davidselassie in #302
- More error fixing by @davidselassie in #303
- Add initial metrics implementation. by @whoahbot in #296
- Adds batching getter input helpers by @davidselassie in #304
- Move Python multiprocessing execution mode to the testing namespace by @whoahbot in #305
- We don't actually depend on bincode ourselves anymore by @davidselassie in #308
- Rename IO classes by @davidselassie in #307
- Move SimplePollingSource and fix align_to argument by @davidselassie in #309
- Add window metadata object to the output of windowing operators by @whoahbot in #311
- Flushes StdOutSink by @davidselassie in #314
- Deterministically awaken keys by @davidselassie in #315
- Passes a
now
argument tobuild_part
andnext_batch
by @davidselassie in #316 - Adds
TestingSource.{ABORT, EOF}
by @davidselassie in #317 - Adds
get_fs_id
argument to{Dir,File}Source
by @davidselassie in #320 - Adds vermin pre-commit by @davidselassie in #323
- Adds a simple retry functionality to
SimplePollingSource
by @davidselassie in #324 - Fix all examples by @Psykopear in #322
- Non-linear dataflows and Python operators by @davidselassie in #321
- New docs structure by @cra in #325
- Move module docstring sections into markdown docs by @davidselassie in #330
- Removes
key_split
operator; fixes type annotations by @davidselassie in #331 - Runtime typecheck
Stream
arguments to operators by @davidselassie in #334 - Changes snapshot and backup interval to saner defaults by @davidselassie in #335
- Performance work by @davidselassie in #333
- Handle
Optional
and more complex types in operator signatures by @davidselassie in #338 - Tuples are faster than lists by @davidselassie in #339
- Update getting started by @whoahbot in #336
flat_map_batch
operator by @davidselassie in #341- Add an overload to
branch
operator so if it gets aTypeGuard
, the output streams are typed correctly by @davidselassie in #342 - Add documentation for joins and wordcount to getting started by @whoahbot in #340
- Bumps
rusqlite
deps by @davidselassie in #345 - Uses
ruff format
instead ofblack
by @davidselassie in #346 - Kafka connector revamp, schema registry support, py-operators by @Psykopear in #332
- Renames
batch
operator tocollect
by @davidselassie in #351 - Joins concepts documentation and
join_window
fixes by @davidselassie in #344 - Properly handle resume from EOF by @davidselassie in #352
- Add getting started guides for execution and snapshot by @whoahbot in #350
- Start working on 0.18 migration guide by @whoahbot in #337
- Changelog updated for kafka connector by @Psykopear in #348
- Update container guide by @Psykopear in #349
- Long Format Documentation Update by @awmatheson in #347
- Prepare 0.18 by @whoahbot in #354
New Contributors
Full Changelog: v0.17.1...v0.18.0