github redpanda-data/redpanda v22.1.1

latest releases: v24.2.10, v24.2.10-rc1, v24.2.9...
2 years ago

Features

Transparent tiered storage

  • #2818 Add more detailed shadow indexing metrics. by @ztlpn in #3881
  • #3351 Fix raft bootstrap failure after topic recovery. by @Lazin in #3195
  • #3507 Fix a rare crash that could happen when doing a shadow indexing fetch concurrently with topic deletion. by @ztlpn in #3813

Centralized configuration

  • Redpanda now has an internal store for configuration properties and many properties can be set without restarting redpanda. When you upgrade Redpanda, your existing configuration will be imported automatically. Please consult the documentation for more detail on this feature. New rpk subcommands for handling cluster configuration: rpk cluster config [edit|import|export|set|get|status|force-reset|lint]. by @jcsp in #3760
  • Added support for centralized configuration in the operator. by @nicolaferraro in #3978
  • operator: added drift detection controller to prevent external change of properties set in the CR. by @nicolaferraro in #4140

Maintenance mode

Rack awareness

  • #3695 Implement rack-aware replica assignment. Add enable_rack_awareness parameters that enables the feature. by @Lazin in #4025
  • #4036 Improve rack-aware replica placement. by @Lazin in #4142

Consumer offsets

Idempotency

Other

  • #1275 Kafka server send group topic partition offset metric to prometheus. by @ZeDRoman in #3181
  • #2166 Three new metrics have been added: vectorized_storage_disk_total_bytes, vectorized_storage_disk_free_bytes, vectorized_storage_disk_free_space_alert. More direct monitoring of Redpanda's disk usage and free space. A simple alert metric field vectorized_storage_free_space_alert which is non-zero when space is running low. Continuing improvements around full disk handling. by @ajfabbri in #3885
  • #2707 Implemented simple metrics reporter. by @mmaslankaprv in #3066
  • #2876 The minimum disk space allocation size is now configurable via the segment_fallocation_step property. The default is unchanged from the previous behavior (32MB). You may wish to decrease this property if creating large numbers of partitions on systems with limited disk space. by @ZeDRoman in #3288
  • #2987 Add get transaction request to admin_api. by @VadimPlh in #3145
  • #3274 #3337 Creating topics with larger partition counts is made safer, by validating that the cluster has sufficient resources to fulfill the request before trying to create partitions. New checks for sufficient RAM and sufficient file handles can be overridden if necessary with the new topic_memory_per_partition and topic_fds_per_partition settings respectively. The defaults are to require 1MB of RAM and 10 open file handles per partition. This functionality helps to avoid situations where a redpanda cluster can become unstable when the partitions created outstrip available system resources. by @jcsp in #3398
  • #3412 Memory utilization on systems with large number of partitions can now be tweaked using configuration properties storage_read_buffer_size (default 128KiB) and storage_read_readahead_count (default 10). These properties may be decreased to more conservative values such as buffer_size=16KiB, readahead_count=1 to reduce the per-partition memory overhead and improve stability when the number of partitions is large (e.g. more than 10000). by @jcsp in #3421
  • #3544 Implement get_transaction request for admin api. It returns information about all transactions. by @VadimPlh in #3659
  • #3544 #3699 Add delete partition from transaction request. by @VadimPlh in #3661
  • #3688 Added new redpanda options: rpc_server_connection_rate_limit - Maximum connections per second for one core; rpc_server_connection_rate_limit_overrides - Overrides for specific ips for maximum connections per second for one core. (It should be array of strings like ['127.0.0.1:90', '50.20.1.1:40']). by @VadimPlh in #3922
  • #3689 Added full support for leader epoch in Redpanda. by @mmaslankaprv in #3788
  • #3698 It is now possible to impose limits on the number of client connections. Cluster configuration properties kafka_connections_max and kafka_connections_max_per_ip are added to control this behavior. By default both properties are switched off (i.e. the connection count is unlimited). These limits apply on a per-node basis. Note that the number of connections is not exactly equal to the number of clients: clients typically open 2-3 connections each. by @jcsp in #3901
  • #3704 Redpanda upgrades are made more robust by tracking all node versions, such that new features can wait until all nodes are up to date before activating. by @jcsp in #2938
  • #3704 The v1/features admin API endpoint is added, which can be used by automation scripts to query an internal logical cluster version, and feature flags for newly added functionality. by @jcsp in #2938
  • #3751 The Redpanda Admin REST API now includes optional username/password authentication. This is in addition to the existing mTLS option, which remains available. To ensure backward compatibility for existing systems, username/password authentication is disabled by default, and may be enabled using the admin_api_require_auth cluster configuration property. The enable_admin_api configuration property is deprecated and will be ignored. The Admin API is now necessary for making subsequent Redpanda configuration changes, and security conscious users have the option to enable TLS&password authentication. by @jcsp in #3819
  • #3790 Introduces an optional field preferredAddressType to the cluster CRD. Allows to specify the preferred node address type to advertise for Kafka API when the subdomain is empty. by @dimitriscruz in #3794
  • #3892 Adds an option for configuring a bootstrapping load-balancer targeting the external Kafka API listener. by @dimitriscruz in #3896
  • #3991 Added set of metrics to increase visibility of partition movement process. by @mmaslankaprv in #4014
  • #4073 Improves rpk topic consume -o, allowing consuming by time, and allows consuming to exit when the ends of partitions are reached. by @twmb in #4091
  • #4146 Schema Registry: Support references for Avro schema. by @BenPope in #4154
  • #4301 Add /v1/debug/reset_leaders request to reset leaders info on node. by @VadimPlh in #4237
  • #4301 New admin api request /v1/debug/partition_leaders_table. by @VadimPlh in #4259
  • #4333 rpk: add broker version info in redpanda admin broker ls. by @r-vasquez in #4409
  • #4402 operator: added hooks for rolling restarts and upgrades. by @nicolaferraro in #4385
  • operator: Allow Redpanda resources to be directly configured. by @BenPope in #3773
  • schema_registry: Support GET /subjects/{subject}/versions/{version}/referencedBy. by @BenPope in #3299
  • Add cluster level configuration parameters cloud_storage_enable_remote_read and cloud_storage_enable_remote_write that can be used to enable shadow indexing for all topics. by @Lazin in #3233
  • Add settings for max_old_gen_size. by @VadimPlh in #3167
  • Admin API requests sent to non-leader nodes are redirected more reliably to the leader. by @jcsp in #3571
  • Cluster status stanza will have version field for convenience. by @pvsune in #4207
  • Configuration property kafka_connections_max_overrides is added, enabling setting connection count limits on individual client IPs. by @jcsp in #4221
  • Kafka: return shadow indexing configs in a topic describe handler. by @LenaAn in #3192
  • Kubernetes operator: All Redpanda nodes created through the Kubernetes operator will have a default cloud storage maximum upload interval (cloud_storage_segment_max_upload_interval_sec) of 30 minutes. by @0x5d in #3218
  • Made min_free_memory parameter of background reclaimer configurable. by @mmaslankaprv in #3410
  • New admin API endpoint PUT /v1/features/<feature>, for use on future feature flags which may not be enabled by default. by @jcsp in #3936
  • Leader epoch is now supported in fetch requests. by @mmaslankaprv in #4030
  • Several numeric configuration properties have improved bounds checking, preventing cluster instability resulting from invalid property values. by @jcsp in #3637

Bug Fixes

  • #4423 Fixes assertion in rpc after parsing failure. by @dotnwat in #4355
  • #1892 #3866 #3884 #4045 Fixes for node operations fuzzy test. by @mmaslankaprv in #4089
  • #2246 Fixes a corner case that might happen when partition was re-balanced. by @mmaslankaprv in #3242
  • #2246 Fixes for node operations fuzzy tests. by @mmaslankaprv in #2737
  • #2397 list_offsets by time now ignores control batches. by @BenPope in #4246
  • #2619 k8s: After cluster config is changed in CR, the redpanda pods will get restarted for the config to get used. by @alenkacz in #3262
  • #2911 Various error-level log messages are fixed. by @jcsp in #3543
  • #2987 Add mark tx expired request to admin_api by @VadimPlh in #3460
  • #2989 Fix rare data corruption when compacting multiple segments. by @ztlpn in #3636
  • #3030 When adding new nodes to the cluster, the added nodes will no longer attempt to serve Kafka requests before they are up to date with the cluster metadata. This avoids a case where clients might experience transient timeouts if accessing the cluster during node joins. by @jcsp in #3716
  • #3079 #3382 Fixed incorrect return error code for topic deletion when topic does not exist. by @dotnwat in #3413
  • #3098 Health monitor abortable refresh. by @mmaslankaprv in #3132
  • #3211 Schema Registry: Fix a crash during compatibility checks. by @BenPope in #3214
  • #3235 Logs at debug level for authorization failures that are expected (e.g. when performing authorization to establish client visibility rather than authorizing for a specific resource requested by a client). Logs the authenticated principal when authorization fails. by @dotnwat in #3234
  • #3263 k8s: fixed bug with nodeport handling for clusters that have external connectivity with fixed port. by @alenkacz in #3268
  • #3277 #3328 Fixes for node decommissioning in face of failures. by @mmaslankaprv in #3355
  • #3301 k8s: fixed a bug with attempting to patch PDB selector when not required. by @alenkacz in #3303
  • #3323 Fix rare crash that could happen when log segments eviction happened concurrently with fetching near the start of the log. by @ztlpn in #3372
  • #3336 Fixed assertition when handling fetch requests. by @mmaslankaprv in #4271
  • #3360 Fix the deadlock that can be triggered by uploading the manifest right before a shutdown. by @Lazin in #3769
  • #3383 k8s: fixed bug in external port binding when port is specified in configuration. by @alenkacz in #3385
  • #3400 Move offset translator state. by @mmaslankaprv in #3433
  • #3407 wasm: Fix bug for incorrect calculation of record batch size. by @graphcareful in #3411
  • #3409 Consuming from a very large number of partitions at once is now subject to a per-request size limit, reducing the risk of exhausting memory if a client specifies a partition count and per-partition size limit that is greater than available RAM. The size limit per-fetch is set via the kafka_max_bytes_per_fetch configuration property, default 64MiB. by @jcsp in #3420
  • #3428 Improved stability when doing large kafka fetch requests under low memory conditions. by @jcsp in #3435
  • #3432 Fixed listing brokers when some of the nodes are down. Fixed describing groups when some of the nodes are down. by @mmaslankaprv in #3422
  • #3474 cluster: Handle gate_closed_exception in handle_leadership_notification. by @VadimPlh in #3488
  • #3476 #3880 Stopping consumer group pending operations. by @mmaslankaprv in #4048
  • #3486 Fix an issue where nodes may have stale leadership metadata for a short period after a node restarts. @jcsp in #3487
  • #3494 k8s: fix how schema reg node cert is mounted. by @simon0191 in #3496
  • #3528 Fixed possible deadlock of raft groups. by @mmaslankaprv in #3537
  • #3539 Fix for possible segmentation fault that might happen when moving group underling partition. by @mmaslankaprv in #3553
  • #3559 Schema Registry: Fix a crash when publishing multiple protobuf schema. by @BenPope in #3596
  • #3562 kafka: use descriptive error type for auth fails. by @NyaliaLui in #3536
  • #3581 #3583 Pandaproxy: Creating a consumer with an existing name now fails with {"error_code": 40902}. Pandaproxy: An inactive consumer is now timed out after pandaproxy.consumer_instance_timeout_ms. by @BenPope in #3584
  • #3588 Improved handling of configurations where advertised_kafka_api or kafka_api property has different names between nodes, for example during a configuration change & rolling restart. by @jcsp in #3589
  • #3615 RPK commands that use the Redpanda admin API are more robust when a node is offline or a leadership transfer is in process. by @jcsp in #3635
  • #3633 Schema Registry: Support protobuf encoded protobuf schema for compatibility with Protobuf Serializers. by @BenPope in #3663
  • #3639 Kubernetes operator: the webhook now guards against decrease of the number of assigned cores. by @nicolaferraro in #3640
  • #3644 Checks for node CPU count decreases are more robust, to guard against partition unavailability resulting from incorrectly decreasing the CPU count of an existing redpanda node. by @jcsp in #3645
  • #3720 Metrics reporter will work with TLS secured endpoints requiring SNI extension. by @mmaslankaprv in #3721
  • #3772 rpk: add missing loggers. by @daisukebe in #3950
  • #4045 Fixed cleaning up consumer groups state. by @mmaslankaprv in #4192
  • #4071 Fix stolen heartbits during big timeout for reconnect. by @VadimPlh in #4180
  • #4113 s/parser: fixed reading batches with header_crc equal to 0. by @mmaslankaprv in #4099
  • #4120 Schema Registry: Support default null type in union for Avro. by @BenPope in #4129
  • #4171 delete_retention_ms interprets -1 as infinite retention, i.e. never delete data. This is Kafka-compatible and in line with existing documentation. by @LenaAn in #4227
  • #4185 Fixed possible deadlock in raft::mux_state_machine by @mmaslankaprv in #4153
  • #4224 Fix retention settings not working with acks=1 and enabled shadow indexing. by @ztlpn in #4277
  • #4228 #4236 Fixes issue in leadership draining that may allow some raft group leadership to return. by @dotnwat in #4284
  • #4308 ListOffset returns the earliest offset with timestamp greater or equal to the timestamp specifed. If no such an offset is found, offsetsForTimes() method in Consumer should return null. See KIP-79. by @LenaAn in #4322
  • #4310 Accept partition_count, replication_factor and redpanda.datapolicy in alter config handler. by @ZeDRoman in #4313
  • #4494 storage: assertion failure in offset_translator_state.cc. by @ztlpn in #4497
  • #4352 vote_stm: do not step down when replicating configuration failed. by @mmaslankaprv in #4342
  • #4266 dissemination: do not query topic metadata for ntp. by @mmaslankaprv in #4389
  • #4181 k/fetch: validate fetch offset against high watermark. by @mmaslankaprv in #4567
  • Do not update connections with old config. by @mmaslankaprv in #4225
  • Fix bug in topic recovery. by @Lazin in #4312
  • Fix consumer group recovery. by @mmaslankaprv in #3732
  • Fix operator can't be deployed via Helm Chart when another kubebuilder operator is deployed to the same namespace. by @rawkode in #3871
  • Fix partial truncation. by @mmaslankaprv in #3722
  • Fix self leadership transfer. by @VadimPlh in #3446
  • Fixed assertion triggered when fetching from empty topics with transactions enabled. by @mmaslankaprv in #3159
  • Fixed bug that may lead to situation in which partition will not be able to elect a leader since follower heartbeats are suppressed. by @mmaslankaprv in #3259
  • Fixed cleanup policy application when using cloud storage or transactions. by @mmaslankaprv in #3743
  • Fixed data loss in shadow indexing archived data that could occur after quick partition leadership transfer back and forth between two nodes. Compatibility note: previous redpanda versions won't be able to read shadow indexing data archived by newer versions. by @ztlpn in #3365
  • Fixed error preventing redpanda from starting if the only partition movement operation involved x-core move. by @mmaslankaprv in #3558
  • Fixed incorrect handling of failed snapshot delivery that may lead to situation in which snapshot is being redelivered in tight loop. by @mmaslankaprv in #3245
  • Fixed regression that caused an assertion to be triggered during Fetch request handling. by @mmaslankaprv in #3587
  • Fixed time based offset queries in ListOffsetsRequest handler. by @mmaslankaprv in #3161
  • Fixes a consistency issue with transactions. by @rystsov in #3232
  • Fixes a panic in rpk group seek if there was an error during the offset commit. by @twmb in #3404
  • Fixes a potential bug in consumer groups in which a pending member is stuck in a group because redpanda did not set an expiration time for pending members. by @dotnwat in #3761
  • Fixes falsely aborted transactions. by @rystsov in #3616
  • Makes Redpanda transactions compatible with Sarama. by @rystsov in #3189
  • The cluster metrics reporter now works on single node redpanda clusters as well as multi-node redpanda clusters. by @jcsp in #3675
  • Using partition revision_id to clear out partition leadership metadata by @mmaslankaprv in #3834
  • Validation of replica set passed into the move partition admin API. Fixed removing partition that is being moved. by @mmaslankaprv in #3846
  • Querying partition leader via Metadata API will return correct data. by @mmaslankaprv in #3676

Improvements

  • #2142 #2568 Better handling of consumer group related errors. by @mmaslankaprv in #3205
  • #3269 redpanda/cluster: Improve logging for leader_balancer. by @BenPope in #3271
  • #3270 #3304 Gracefully handle a situation when segment in S3 is truncated. Before client would be stuck in a cycle trying to proceed, now client will get an error. by @LenaAn in #3280
  • #3333 Downgrade log message severity in case of http client disconnect since it's a normal mode of operation. by @LenaAn in #3356
  • #3429 #3561 Better stack traces in case of crash on low memory. by @LenaAn in #3570
  • #3542 under_replicated_partitions metric now reflects the number of follower which are actually behind the leader. by @mmaslankaprv in #3667
  • #3548 Shadow indexing memory utilization was optimised. by @Lazin in #3607
  • #3964 "Fetch requested very large response" log messages are reduced from INFO to DEBUG severity. by @jcsp in #3965
  • #4083 Improved behavior for restarted raft groups that reduces election churn. by @dotnwat in #4151
  • #4149 Alter topic configuration not fail if gets unsupported properties. by @ZeDRoman in #4223
  • #4172 Leader rebalancing is linear time in the total number of shards. by @travisdowns in #4218
  • #4187 Faster leader election during partition moves. by @mmaslankaprv in #4157
  • #4210 Make debugging raft issues easier. by @mmaslankaprv in #4206
  • #4376 rpk: improve output on configuration updates. by @dotnwat in #4515
  • #3939 Support mTLS principal propagation. by @BenPope in #4549
  • k8s/operator: Reserve 10% of memory for the OS by default, to avoid OOMKill. by @BenPope in #3831
  • rpk: try to send write requests to Leader but fallback to broadcast to all brokers. by @simon0191 in #3565
  • Ability to monitor bytes read/written to the partition. by @mmaslankaprv in #3530
  • Add compressed in-memory index for segments accessed via Shadow Indexing. by @Lazin in #3830
  • Add schema registry to kubernetes samples. by @BenPope in #3869
  • Adjust rpk help text. by @Deflaimun in #3465
  • Changed decay coefficient of target priority. This way the target priority will decay faster allowing other nodes to became leader with less failed leader election rounds. by @mmaslankaprv in #4070
  • Cloud storage key is now redacted when logging redpanda configuration. by @jcsp in #3219
  • Faster recovery of partition replicas which are behind the leader. by @mmaslankaprv in #2683
  • Improved error handling when manipulating JSON in low memory situations. by @BenPope in #4076
  • Logging low storage space condition, to help alert users to adjust retention policies, even if they don't have other external monitoring systems set up. by @ajfabbri in #3715
  • Logging verbosity during leadership transfer is decreased. by @jcsp in #3315
  • Pandaproxy REST: Improve header validation and Swagger API. by @BenPope in #3664
  • Remove some misleading error-level log messages that occurred during node shutdown. by @jcsp in #3226
  • Replace ZSTD allocation_error to bad_alloc. Catch all exception in fiber inside rps_simple_protocol. by @VadimPlh in #3567
  • The append_chunk_size configuration property now has an upper bound of 32MiB, to avoid issues when this was erroneously set to very high values. by @jcsp in #4107
  • Use updated seastar. by @tchaikov in #4304
  • coproc: Move materialized partitions by @graphcareful in #3212
  • coproc: New abstraction for safe shutdown of materialized logs by @graphcareful in #2940
  • coproc: Yield if inputs haven't been hydrated by @graphcareful in #3576
  • rpk: Various improvements to CLI and upgraded base franz-go. by @twmb in #4576

Full Changelog: https://github.com/redpanda-data/redpanda/compare/v21.11.15..v22.1.1

Don't miss a new redpanda release

NewReleases is sending notifications on new releases.