redpanda-data/redpanda v22.3.1 on GitHub

Features

Partitions of topics enabled for remote storage now follow the topic's retention policy specified via retention.bytes and retention.ms. If those are unspecified, cluster level defaults are used. The migration of existing topics is done in such a way that they preserve the previous behavior (i.e. data will not be deleted from cloud storage for topics created before v22.3). However, note that for new topics retention is applied and data can be expired from cloud storage automatically. by @VladLazar in #6833
Tiered Storage will now clean up objects in S3 when topics are deleted. This may be avoided by disabling Tiered Storage on the topic before deleting it. by @jcsp in #6683
Introduce retention.local-target.ms and retention.local-target.bytes topic configuration options. They control the retention policy of each partition within the topic and are only relevant for topics with remote write enabled. by @VladLazar in #6613
Adds HTTP Basic Auth to all Schema Registry endpoints by @BenPope in #6639
Adds HTTP Basic Auth to all Pandaproxy endpoints and uses the kafka client cache on /brokers by @NyaliaLui in #6452
Support HTTP basic authentication from Redpanda console to schema registry by @pvsune in #7144
Allow authentication method per kafka listener. by @RafalKorepta in #6940
Enables transactions feature (enable_transactions=true) by default on the service side. by @bharathv in #6770
#4824 #4826 Transactions are now supported on compacted topics. by @bharathv in #6664
Schema Registry and REST Proxy can now use ephemeral credentials to authenticate with Redpanda. by @BenPope in #6931
Users are no longer required to supply a node ID in each node's node configuration file. All nodes must be upgraded before using this feature. Node IDs on existing nodes will be preserved when using this. by @andrwng in #6659
#2760 Add defaulting webhook for Console by @pvsune in #6282
#2760 Introduce SecretKeyRef type to reference Secret objects by @pvsune in #6282
#2760 Introduces NamespaceNameRef type instead of using the full glory of corev1.ObjectReference by @pvsune in #6282
#333 Configurations of all nodes across the cluster can be identical by @dlex in #6744
#333 Seed driven cluster bootstrap mode. Disable empty_seed_starts_cluster to use it. That will allow the set of servers listed as seeds to start a cluster together without a root node. All seed servers must be available for a cluster to be created, with identical node configuration. Afterwards, none of the seed servers will try to form another new cluster if their local storage is wiped out, unless all seed nodes are wiped together at the same time. Cluster now gets a cluster UUID reflected by a new controller log message, and stored in kvstore. by @dlex in #6744
#333 Cluster is bootstrapped with all its seed servers by @dlex in #6744
#333 Wiped out seed cluster members will not start their own new cluster by @dlex in #6744
#6355 Added rack awareness constraint repair in the continuous partition balancing mode. by @ztlpn in #6845
Kubernetes Operator: it's now possible to specify a TLS issuer for Pandaproxy API by @nicolaferraro in #6637
Kubernetes Operator: external ports can be explicitly specified for admin API, Panda proxy and schema registry by @nicolaferraro in #6564
Support AlterConfig/IncrementalAlterConfig request for replication.factor property. by @VadimPlh in #6460

Bug Fixes

#5163 Fix compaction for group_*_tx log records by @VadimPlh in #6086
Fix possible shadow indexing manifest corruption under memory pressure. by @ztlpn in #6507
Time queries are more reliable on topics using client-set timestamps via the CreateTime mode by @jcsp in #6606
Improve robustness of Schema Registry and HTTP Proxy under std::errc::broken_pipe. by @BenPope in #6687
#6561 It's now possible to set the log level for kafka/client and r/heartbeat through the admin API. by @BenPope in #6688
#6508 Returning retryable kafka error code in raft replication failure events that require the client to retry by @graphcareful in #6712
#6827 Fix rack aware placement after node rack id changes. by @ztlpn in #6900
Fix an issue where state machine snapshots could degrade performance under certain workloads (#6854) by @jcsp in #6932
Fixes license setting in Redpanda cluster by @pvsune in #7110
Fixed a bug that prevented redpanda from uploading the last batch in the log to cloud storage if timeboxed uploads were enabled and the batch contained exactly one message. by @ztlpn in #7096
Redpanda shutdown is more prompt when client reads to S3 are in progress at the same time as Redpanda shuts down. by @jcsp in #7181
Fix init_producer_id timeouts by @rystsov in #6312
Fix a bug that configures all clusters as development clusters by @joejulian in #7107
Fix bug in remote read/write enablement. Topic level overrides are now respected in all cases. by @VladLazar in #6663
Fix for flex request parsing failure when request header client id is empty or null by @graphcareful in #6585
Fix incorrect assertion in vote_stm that in some situation may lead to redpanda crash by @mmaslankaprv in #6546
#6018 Fix consistency violation caused by split-brain of the txn coordinator by @rystsov in #6019
#6063 Fix a need for retrying truncation of compacted topic partition when it failed by @mmaslankaprv in #6071
#6795 #5507 Enable cluster config editing when IAM roles are used by @abhijat in #6864

Improvements

New configuration property storage_strict_data_init. When the storage_strict_data_init property is enabled a user will have to manually add an empty magic file called .redpanda_data_dir to Redpanda's data directory for RP to start. by @ballard26 in #6786
New metric for partition movement available bandwidth by @ZeDRoman in #6110
#4871 New metrics for partition movements amount: redpanda_cluster_partition_moving_to_node, redpanda_cluster_partition_moving_from_node, redpanda_cluster_partition_node_cancelled_movements by @ZeDRoman in #5749
Improve moving partitions at scale by @mmaslankaprv in #6905
Add fields for RedpandaCloud login provider under .spec.login.redpandaCloud by @pvsune in #6359
#3278 Support safe epoch incrementing for idempotent/transactional producer in retries cases by @VadimPlh in #5362
Tunable cluster configuration properties are added to set bounds on the segment.bytes topic property. If log_segment_size_min and/or log_segment_size_max are set, then any segment.bytes outside these bounds will be silently clamped to the permitted range. This prevents poorly-chosen configurations from inducing the cluster to create very large numbers of small segment files, or extremely large segment files. by @jcsp in #6492
A new tunable cluster property log_segment_size
_jitter_percent is added, to enable greater determinism in test/benchmark environments by disabling jitter. The default 5% jitter is the same as in previous versions. by @jcsp in #6515
Kubernetes Operator: Install Console from the operator that connects to Redpanda Kafka API via mTLS. by @pvsune in #6280
rpk topic consume now supports %a to print attributes; see rpk topic consume --help for more details by @twmb in #6894
rpk topic consume now has --print-control-records to opt into printing control records (for advanced use cases) by @twmb in #6894
rpk cloud has a new byoc command, which manages the byoc plugin directly and makes it easier to use by @twmb in #7102
rpk redpanda admin config log-level has been updated for v22.3 loggers by @twmb in #7197
rpk topic produce now has --allow-auto-topic-creation, which can create non-existent topics if the cluster has auto_create_topics_enabled set to true by @twmb in #7197
#6844 Introduced a configurable limit for the number of segments pending deletion from the could. This limit is controlled by the cloud_storage_max_segments_pending_deletion cluster config. by @VladLazar in #7191
Support RedpandaAdmin in the Console CR by @pvsune in #6667
Adds license field in Cluster spec by @pvsune in #6863
Improved admin API error handling to reduce 500 errors on internal RPC failures. by @mmaslankaprv in #5916
Schema Registry: Disable compression on the _schemas topic to better support manually creating schemas. by @BenPope in #6156
Simplified schema registry deployment: the schema registry now always run as part of a redpanda cluster. Running it separately is no longer supported. by @jcsp in #4324
Simplified HTTP proxy deployment: the HTTP proxy now always run as part of a redpanda cluster. Running it separately is no longer supported. by @jcsp in #4324
Internal topics are created with an appropriate replication factor more reliably on clusters with at least 3 nodes, whereas previously in some circumstances they could exist in a single-replica state for a period of time before being upgraded to a replicated state. by @jcsp in #6299
Configuration property id_allocator_replication is deprecated in favor of internal_topic_replication_factor by @jcsp in #6299
Configuration property transaction_coordinator_replication is deprecated in favor of internal_topic_replication_factor by @jcsp in #6299
The Schema Registry topic's default replication factor is now controlled by internal_topic_replication_factor rather than default_topic_replication. by @jcsp in #6299
#2760 Use Redpanda wildcard certificate for Ingress since Console is exposed through https://console. by @pvsune in #6282
#2760 Rename ClusterKeyRef to ClusterRef by @pvsune in #6282
Kubernetes Operator: added option to customize the external advertised address of Redpanda nodes by @nicolaferraro in #6304
The cluster configuration properties raft_heartbeat_interval_ms and raft_heartbeat_timeout_ms may now be modified without restarting redpanda. by @jcsp in #6426
#5154 During shutdown, spurious "offset_monitor::wait_aborted" log error messages are no longer emitted. by @jcsp in #6419
#5460 Replicas of __consumer_offsets partitions are distributed evenly across brokers, resulting in better (although not perfect) distribution of consumer group coordinators. by @dlex in #6251
Kubernetes Operator: added options to configure generated Ingress resources for Console and Pandaproxy by @nicolaferraro in #6456
Suppress logging for harmless 404 responses from S3 while probing for transaction range objects by @jcsp in #6526
Logging verbosity is reduced when S3 backends unexpectedly close connections by @jcsp in #6524
Console deletion is fully independent of Cluster by @pvsune in #6474
Redpanda now cleans up empty directories in the tiered storage cache directory on startup, as well as after removing segments. by @jcsp in #6533
RedpandaCloud AllowedOrigins can be set as a list by @pvsune in #6679
Improved shadow indexing memory efficiency by @Lazin in #6558
Incorporates the kafka client cache on /consumers Pandaproxy endpoints. The cache supports multiple authenticated connections with HTTP Basic Auth. by @NyaliaLui in #6693
Incorporate the kafka client cache on /topics Pandaproxy endpoints. The cache supports multiple authenticated connections with HTTP Basic Auth. by @NyaliaLui in #6618
Improve robustness of Schema Registry and HTTP Proxy under std::errc::timed_out. by @BenPope in #6885
rpk now allows setting hostnames with dashes or numbers in the final domain segment by @twmb in #6894
rpk now seeks to end offsets if you seek to a future timestamp, rather than -1 by @twmb in #6894
rpk now supports using basic auth while creating a new acl user (--password for basic auth, --new-password or -p for the new user's password) by @twmb in #6894
rpk now defaults to SCRAM-SHA-256 if SASL is specified, and now rejects invalid SASL mechanisms by @twmb in #7197
#6495 rpk redpanda config bootstrap no longer changes configuration settings that have already been manually modified (e.g., redpanda.kafka_api[0].port) by @twmb in #7026
The properties cloud_storage_enable_remote_[read|write] are now applied to topics at creation time, and if they subsequently change, then existing topics' properties do not change. To modify the tiered storage mode of existing topics, you may set the redpanda.remote.[read|write] properties on the topic. by @jcsp in #6950
Support retrieving credentials from kube2iam. by @missingcharacter in #7030
#6892 #7025 #7016 faster recovery from rolling restart by @mmaslankaprv in #7017
Recreate Console's referenced ConfigMap if manually deleted. by @pvsune in #7077
#6111 #6023 Improved stability under random read workloads to tiered storage topics. by @jcsp in #7042
Improved stability under read workloads touching many tiered storage segments in quick succession by @jcsp in #7082
#6111 #6023 A new cluster configuration property cloud_storage_max_readers_per_shard is added, which controls the maximum number of cloud storage reader objects that may exist per CPU core: this may be tuned downward to reduce memory consumption at the possible cost of read throughput. The default setting of one per partition (i.e. the value of topic_partitions_per_shard is used). by @jcsp in #7042
A new cluster configuration property cloud_storage_max_segments_per_shard is added, which controls the maximum number of segments per CPU core that may be promoted into a readable state from cloud storage. This may be tuned downward to reduce memory consumption at the possible cost of read throughput. The default setting is two per partition (i.e. the value of topic_partitions_per_shard multiplied by 2 is used). by @jcsp in #7082
Two new metrics are added to the /public_metrics endpoint: redpanda_cloud_storage_active_segments and redpanda_cloud_storage_readers. by @jcsp in #7082
Two new metrics are introduced to help track the lifetime of segments uploaded to the cloud by @VladLazar in #7133
- redpanda_cloud_storage_segments:
  - Description: Total number of accounted segments in the cloud for the topic
  - Labels: redpanda_namespace, redpanda_topic
  - Type: gauge
- redpanda_cloud_storage_segments_pending_deletion:
  - Description: Total number of segments awaiting deletion from the cloud for the topic
  - Labels: redpanda_namespace, redpanda_topic
  - Type: gauge
pandaproxy: consumer fetch: More gracefully handle partition movement by @BenPope in #7210
pandaproxy: Shut down consumers more gracefully during shutdown. by @BenPope in #7210
Set Kafka SASL password as environment variable by @pvsune in #7112
Improve transaction metadata handling in Shadow Indexing by @Lazin in #6001
Improves logging in tx subsystem by moving everything under one logger and adds additional partition context. by @bharathv in #6556
Print the timestamp along with the version info at startup by @daisukebe in #6321
#5324 Recover from failures quickly by cleaning up resources. by @bharathv in #5730
#6214 Transactions can now span leadership changes of transaction coordinator. by @bharathv in #6252
#6795 #5507 Extend validation to make sure secrets do not get supplied when they are not used by Redpanda. by @abhijat in #6864
#7119 Includes changes to make /v/1features/license (GET) to include the checksum of the loaded license in response by @graphcareful in #7130
#7119 Includes changes to make /v1/features/license (PUT) call totally idempotent by @graphcareful in #7130
Added new property kafka_request_max_bytes to control the maximum size of a request processed by server. by @mmaslankaprv in #6283
Controller log limiting mechanism. by @ZeDRoman in #6641
RPS of requests that creates entry in controller log can now be limited. by @ZeDRoman in #6641

Full Changelog: v22.2.7...v22.3.1