Arroyo 0.2.0
Arroyo is a new, state-of-the-art stream processing engine that makes it easy to build complex real-time data pipelines with SQL. This release marks our first versioned release of Arroyo since we open-sourced the engine in April.
We're excited to welcome three new contributors to the project:
- @rtyler made their first contribution in #8
- @akennedy4155 made their first contribution in #49
- @jbeisen made their first contribution in #77
With the 0.2.0 release, we are continuing to push forward on features, stability, and productionization. We’ve added native Kubernetes support and easy deployment via a Helm chart, expanded our SQL support with features like JSON functions and windowless joins, and made many more fixes and improvements detailed below.
Looking forward to the 0.3.0 release, we will continue to improve our SQL support with the ability to create sources and sinks directly as SQL tables, views, UDFs and external joins. We will also be adding a native Pulsar connector and making continued improvements in performance and reliability.
Excited to be part of the future of stream processing? Come chat with the team on our discord, check out a starter issue and submit a PR, and let us know what you’d like to see next in Arroyo!
Features
Native Kubernetes support
As of release 0.2.0, Arroyo can natively target Kubernetes as a scheduler for running pipelines. We now also support easily running the Arroyo control plane on Kubernetes using our new helm chart.
Getting started is as easy as
$ helm repo add arroyo https://arroyosystems.github.io/helm-repo
$ helm install arroyo arroyo/arroyo \
--set s3.bucket=my-bucket,s3.region=us-east-1
See the docs for all the details.
Nomad deployments
Arroyo has long had first-class support for Nomad as a scheduler, where we take advantage of the very low-latency and lightweight scheduling support. Now we also support Nomad as an easy deploy target for the control plane as well via a nomad pack.
See the docs for more details.
SQL features
With this release we are making big improvements in SQL completeness. Notably, we’ve made our JSON support much more flexible with the introduction of SQL JSON functions including get_json_objects
, get_first_json_object
, and extract_json_string
.
We’ve also added support for windowless joins.
Here are some of the highlights:
- Initial JSON functions and raw Kafka Source by @jacksonrnewhouse in #86
- Windowless Joins by @jacksonrnewhouse in #61
- String functions by @jacksonrnewhouse in #17
- Hashing Functions by @akennedy4155 in #49
- Casting between numeric types and strings by @jacksonrnewhouse in #5
- Casting timestamps to text by @jacksonrnewhouse in #32
- String Concat Operator
||
in SQL by @akennedy4155 in #55 - Add COALESCE, NULLIF, MAKE_ARRAY by @jacksonrnewhouse in #89
Connectors, Web UI, and platform support
Arroyo now supports SASL authentication for Kafka and FreeBSD
- Add FreeBSD support by @rtyler in #8, #19
- SASL authentication support to kafka connections by @jacksonrnewhouse in #20
- Add support for changing pipeline parallelism in the Web UI by @jbeisen in #77
Fixes
- Fix filter on partition_by parsing. by @jacksonrnewhouse in #27
- Make parquet state management more reliable by @jacksonrnewhouse in #23
- Fix the quoting of types in the sql package by @jacksonrnewhouse in #64
Improvements
- SQL macro testing by @jacksonrnewhouse in #10
- Add a SQL IR and factor out optimizations by @jacksonrnewhouse in #80
- Multi-arch builds for Docker by @jacksonrnewhouse in #11
- Prometheus and pushgateway in the docker image for working metrics by @mwylde in #16
- Bump datafusion to 23.0, arrow to 37.0 by @jacksonrnewhouse in #92
- Run compiler service locally, compile in debug mode if DEBUG is set by @jacksonrnewhouse in #83
- Replace shelling out to rustfmt with prettyplease by @jacksonrnewhouse in #87
See the full changelog: https://github.com/ArroyoSystems/arroyo/commits/release-0.2.0