github redpanda-data/redpanda v25.1.1

latest releases: v25.2.13, v25.3.2, v25.3.2-rc2...
8 months ago

Features

  • Add custom partitioning of Iceberg tables: partition spec of a newly created Iceberg table is now determined by the redpanda.iceberg.partition.spec topic property. by @ztlpn in #24952
  • During Iceberg translation if a record fails to be translated to Iceberg schema (missing schema, unsupported schema, corrupted data) they will be written to a table with ~dlq suffix. A cluster config which acts as a default and per-topic property is introduced to control this behavior. Available options are: "drop" and "dlq_table" (default). by @nvartolomei in #24980
  • When translating kafka records to Iceberg/Parquet in case translation fails these will be written in the key_value mode to a redpanda.<topic>~dlq table. by @nvartolomei in #24893
  • Redpanda now supports the SASL/PLAIN authentication mechanism. To enable it add "PLAIN" to the sasl_mechanisms cluster configuration list by @michael-redpanda in #24525
  • Schema Registry: Support ?normalize=true for protobuf by @BenPope in #24719
  • metrics: Consumer Group Lag can now be enabled by @BenPope in #25216
  • pre-restart check API endpoint by @bashtanov in #24928
  • rpk: add support for --format in rpk cluster info by @r-vasquez in #25223
  • Improved partition density - This release increased the amount of partitions Redpanda can handle. At the same time we also change the behaviour and defaults of the related topic properties topic_memory_per_partition (new default 200KiB) and topic_partitions_per_shard (new default 5000). To calculate the total allowable amount of partition replicas we no longer divide the total memory space by topic_memory_per_partition but rather an explicit amount that is reserved for partitions via a new property topic_partitions_memory_allocation_percent (default 10%). This effectively results in allowing double the amount of partitions as previously by default. We heuristically try to detect if these settings were previously overriden and apply the previous logic if so. by @StephanDollberg in #24472
  • rpk debug bundle will now collect the startup_log file and crash_reports directory if they are present in the data directory. by @r-vasquez in #24778
  • rpk: Introduce rpk debug remote-bundle to gather a debug bundle from a remote cluster. by @r-vasquez in #23986
  • rpk: rpk group describe now supports printing instance IDs by @daisukebe in #24881
  • Redpanda now includes information about previous crashes in the telemetry data. by @pgellert in #25173
  • Redpanda now supports selecting which format the client certificate's subject DN will be parsed in. The new cluster config tls_certificate_name_format can take two options: legacy or rfc2253. See documentation of the cluster config to understand the difference in how the an X.509 certificate's subject DN will be formatted. by @michael-redpanda in #25312
  • Add cluster config cloud_storage_enable_segment_uploads that can be used to pause the Tiered Storage safely. by @Lazin in * Adds a new rpk topic analyze command that determines information like batch rate and size for a given set of topics. by @ballard26 in #25164
  • New cluster config cloud_storage_enable_remote_allow_gaps is added. If it's set to True Redpanda is allowed to create gaps in the offset range when TS is paused. by @Lazin in #24871
  • New metric redpanda_cloud_storage_paused_archivers which tracks number of partitions with paused segment uploads is added. by @Lazin in #24871
  • New topic property redpanda.remote.allow_gaps is added. If it's set to True Redpanda is allowed to create gaps in the offset range when TS is paused. by @Lazin in #24871

Bug Fixes

  • Addresses startup behavior in SR that may result in inconsistent SR state between nodes by @michael-redpanda in #25403
  • Avoid large allocations for the kafka response sequencing map. by @pgellert in #24725
  • Ensure redpanda_cloud_storage_cloud_log_size metric consistent across all replicas. We used to update it seldomly from the leader replica only which lead to inconsistent/stale values. by @nvartolomei in #24342
  • FIxed an issue that, when Redpanda was under extremely heavy load, the incorrect certificate was used for mTLS authorization result in authorization errors to the client by @michael-redpanda in #25149
  • Fix a bug in group management behaviour when the last member of a group expires while there are pending
    members in the group. by @pgellert in #25270
  • Fix the endianness of snappy_java_compressor headers to match that of snappy-java by @WillemKauf in #25092
  • Fixed a bug in which sliding window compaction may become stuck on failing to build an index map for a single segment. by @WillemKauf in #24323
  • Fixed a bug over-restricting bundled json schemas to treat both "id" and "$id" as restricted keywords, regardless of the active draft. by @IoannisRP in #25020
  • Fixed an issue where creating a topic with a huge number of partitions could lead to a crash. by @IoannisRP in #24135
  • Fixes Iceberg metadata serialization to avoid writing an extraneous empty Avro block. This would previously prevent some query engines (e.g. BigQuery) from reading tables created by Iceberg Topics. by @andrwng in #24913
  • Fixes a bug in Redpanda's Iceberg manifest list Avro definition that previously resulted in an end-of-file (EOF) error when reading manifest list Avro files written by other engines. This could previously crash Redpanda or block Redpanda from appending Iceberg data, and could also prevent certain query engines from successfully reading Iceberg data written by Redpanda. by @andrwng in #24602
  • Fixes a bug in which a segment being rolled and closed could race, leading to a triggered vassert. by @WillemKauf in #24483
  • Fixes a bug in which a segment being added to a log while the log was being closed could race, leading to a triggered vassert(). by @WillemKauf in #24635
  • Fixes a bug in which segments which may have tombstones in them were not considered eligible for self-compaction by @WillemKauf in #24187
  • Fixes a bug in which urgent garbage collection of multiple partitions may only consider one partition in the set. by @WillemKauf in #25257
  • Fixes a bug that could prevent topic recovery on ABS object storage when there are objects in a bucket from multiple clusters (e.g. following a whole cluster restore). by @andrwng in #24439
  • Fixes a bug where rpk wasn't parsing --help when used alongside --redpanda-id in rpk cloud <provider> byoc apply by @r-vasquez in #24369
  • Fixes a bug where failing to audit an authentication event could lead to a broker crash. by @pgellert in #24727
  • Fixes a bug where serializing manifests for Iceberg topics with decimal fields could cause Redpanda to crash or upload invalid manifests by @oleiman in #24463
  • Fixes a bug which may lead to archival_metadata_stm inconsistencies when reconfiguring clusters with recovered compacted topics. by @mmaslankaprv in #24664
  • Fixes a crash during partition shutdown. This can happen during partition moves (cross core/broker) or at broker shutdown. by @bharathv in #24936
  • Fixes a crash resulting from incorrect cleanup of log readers used for iceberg translation. by @bharathv in #24572
  • Fixes a crash that could happen when an error occurred during translation in an Iceberg Topic. by @andrwng in #25215
  • Fixes a race that could prevent Iceberg translation from happening following a leadership change. by @andrwng in #24556
  • Fixes accounting of iceberg commit lag metric that can remain erroneously high in some cases even though the translation if fully caught up. Additionally the change ensures that only partition leaders emit lag metrics while followers emit 0 lag. by @bharathv in #24568
  • Fixes an issue that blocked the compaction of consumer offsets with group transactions. by @bharathv in #24637
  • Fixes an issue where transactions incorrectly timeout due incorrect cleanup of evicted producers. by @bharathv in #24852
  • If a discrete disk is used for cloud storage cache Redpanda previously rejected writes if that disk (cache disk) was full (in degraded state). This is incorrect since the cache disk isn't in the way of writes. From now on, reject writes only if the data disk is full (in degraded state). by @nvartolomei in #24436
  • Previously if redpanda was configured with different mountpoints for data and cache directory we would report metrics only for the cache directory. Now, the original storage_disk_{total,free}_bytes metric will report metrics for the data directory mountpoint and a new storage_cache_disk_{total,free}_bytes metric will report metrics for the cache directory mountpoint. Metrics will be equivalent if both are on the same mountpoint. by @nvartolomei in #24138
  • Remove partial kvstore snapshots at startup. by @ztlpn in #24815
  • Schema Registry/Protobuf: Fix a regression with maps. by @BenPope in #25001
  • The vectorized_internal_rpc_produce_bad_create_time metric has been removed. This metric was introduced in error and never had a value other than zero. by @travisdowns in #25341
  • #16649 Schema Registry: fixes a bug in the Avro compatibility check reader_field_missing_default_value where it was too lenient for missing default values of null-able types. by @pgellert in #24032
  • #23363 rpk: fixes a bug where rpk incorrectly handles IPv6 by adding extra brackets. by @r-vasquez in #24982
  • #23661 Redpanda neglected to include ECDSA based ciphers in the cipher strings used for TLSv1.2 and below. This caused TLS connections that used ECDSA based certificates to fail cipher negotiation when using TLSv1.2 and below. ECDSA ciphers are now in the list of supported ciphers. by @michael-redpanda in #24191
  • #24543 Redpanda will now permit topics to be created with redpanda.remote.[read|write] set to true when a license is expired or missing provided that the cluster config cloud_storage_enabled is set to false. by @michael-redpanda in #24570
  • #24782 Fixes integer overflow issues when given a schema via the POST /subject/{subject}/version where version was > INT_MAX or a negative value was provided. by @michael-redpanda in #24860
  • fixes a very rare situation in which Raft leader can enter into infinite loop trying to recover follower. by @mmaslankaprv in #25018
  • fixes rare bug leading to offset translation inconsistency in recovered topics by @mmaslankaprv in #24618
  • rpk: fixes a bug that prevented serverless cloud profiles from using rpk security user commands. by @r-vasquez in #25371

Improvements

  • The client quota cluster configs are removed (kafka_client_group_byte_rate_quota, kafka_client_group_fetch_byte_rate_quota, target_quota_byte_rate, target_fetch_quota_byte_rate, kafka_admin_topic_api_rate). Please follow the migration docs to migrate from cluster configuration quotas to Kafka API-based quotas. by @pgellert in #24697
  • Introduces the node config crash_loop_sleep_sec, which sets the time the broker sleeps before terminating the process when the limit on the number of consecutive times a broker can crash has been reached. This is most useful in Kubernetes environments where setting this value allows customers to have ssh access into a crash looping pod for a short window of time. by @pgellert in #24787
  • Make Iceberg and topic mount/unmount work well together by @bashtanov in #24780
  • A new metric, vectorized_kafka_produced_bytes, is added to the /metrics endpoint which breaks down bytes produced to Redpanda at the Kafka API by batch compression type. by @travisdowns in #25340
  • Improve the user messages when the
    topic_partitions_reserve_shard0 cluster config is used and a user tries to create a topic with more partitions than the core-based partition limit. by @pgellert in #24378
  • Added metrics for pandaproxy resource usage. by @IoannisRP in #24537
  • Adds a chunked compaction routine to local storage, which is used as a fallback in the case that we fail to index a single segment during sliding window compaction. by @WillemKauf in #24423
  • Adds a number of new metrics to improve observability of the compaction subsystem: by @WillemKauf in #24187
    • Added the _segment_cleanly_compacted metric to segment::probe.
    • Added the _segments_marked_tombstone_free metric to segment::probe.
    • Added the _num_rounds_window_compaction metric to segment::probe.
  • Adds additional debug log messages in the datalake coordinator regarding files to be committed to Iceberg. by @andrwng in #24555
  • Adds logging to mention data removed by compaction. by @andrwng in #24659
  • Adds the observable metrics dirty_segment_bytes and closed_segment_bytes to the storage layer. by @WillemKauf in #24649
  • Adds the tunable log_compaction_adjacent_merge_self_compaction_count, which allows for adjacent merge compaction to make more forward progress in all situations when log_compaction_use_sliding_window is disabled. by @WillemKauf in #25119
  • Allow for optional scheduling of compaction via min_cleanable_dirty_ratio by @WillemKauf in #24991
  • Beta version of Iceberg support was incorrectly classified as "enterprise only". by @oleiman in #24421
  • Disable datalake services in recovery mode by @ztlpn in #24446
  • Enables full struct schema evolution for Iceberg topics by @oleiman in #24862
  • Fixes an overly restrictive condition for retention in Iceberg-enabled topics. by @WillemKauf in #24610
  • Generated parquet files for Iceberg Topics are now compressed with zstd compression. by @rockwotj in #24933
  • Iceberg data files are now uploaded to the table location prefix instead of /datalake-iceberg/ prefix. by @nvartolomei in #24960
  • Iceberg/Datalake: Adds support for schema evolution by primitive type promotion by @oleiman in #24561
  • Implement deleting iceberg tables on topic deletion. by @ztlpn in #24119
  • Improved TLS connection related error messages by @michael-redpanda in #24749
  • Introduce iceberg_target_lag_ms topic property by @oleiman in #25056
  • Leader balancer: don't treat each core as independent and balance total number of leaders on each node as well. by @ztlpn in #24403
  • Makes urgent garbage collection and compaction of logs separate, concurrent processes. by @WillemKauf in #25134
  • Move failed authorization log statements from the kafka logger to a new kafka/authz logger, allowing for fine grained control over log statements for failed authorization. by @rockwotj in #24712
  • Redpanda will now periodically remove expired snapshots from Iceberg Topic tables. by @andrwng in #24813
  • Redpanda will now schedule local segment merges of compacted topics, even when windowed compaction has occurred in a given housekeeping round. This ensures progress in reducing segment count in compacted topics with high produce traffic. by @andrwng in #24874
  • Schema Registry: Add Some metrics for resource usage taken by in-memory schemas by @BenPope in #23794
  • Show leader id in /v1/cluster/partitions response. by @ztlpn in #24565
  • Use streaming parsing of transaction range manifests in tiered storage to avoid large allocations which lead to OOMs with heavy use of transactions. by @nvartolomei in #24728
  • rpk security user is now available for users with Cloud profiles. by @r-vasquez in #24671
  • rpk topic describe now supports the --format flag to display the output in either JSON or YAML. by @r-vasquez in #24387
  • controlling Iceberg translation backlog by @mmaslankaprv in #24990
  • kafka/server: Remove throughput throttling v1 by @BenPope in #16823
  • rpk improvements in rpk cluster health: rpk now exits with code 10 if the cluster is unhealthy; added support of --format to the command and added the cluster UUID to the commands' output. by @r-vasquez in #25213
  • rpk now supports well-known protobuf types when encoding/decoding records using Schema Registry. by @r-vasquez in #24480
  • rpk: save the output of uptime to bare-metal debug bundles created through rpk. by @r-vasquez in #24686
  • schema_registry/protobuf: Rewrite protobuf normalization to improve compatibility with Java client by @BenPope in #25094
  • stable leadership under load by @mmaslankaprv in #24590

None

No release notes explicitly specified.

Full Changelog: v24.3.1...v25.1.1

Don't miss a new redpanda release

NewReleases is sending notifications on new releases.