Features
- Cloud Topics: add
redpanda.storage.modeby @WillemKauf in PR #29352 - cloud_topics_reconciliation_parallelism: Controls the maximum number of L1 objects the reconciler can build concurrently per shard. Defaults to 8. The per-shard memory reserved for reconciliation is part_size × parallelism. by @wdberkeley in #29671
- cloud_topics_reconciliation_upload_part_size: Controls the multipart upload part size used when the reconciler builds L1 objects. Defaults to 5 MiB (the minimum allowed by cloud object storage providers). Together with the parallelism setting, determines the per-shard memory reserved for reconciliation. by @wdberkeley in #29671
- Cloud Topics: Level Zero GC delete pipeline by @oleima In PR #29050
- Cloud Topics: Sharded level zero GC workers by @oleiman In PR #29088
- Cloud Topics: Object path random prefix by @oleiman in PR #29066
- Cloud Topics: Introduce Admin API for Level 0 GC by @oleiman in PR #29388
ct:describe-storageimpl for cloud topics by @WillemKauf In PR #29864ct: cloud topics compaction configs by @WillemKauf in PR #29287- ct: hook cloud topics recovery into whole cluster restore by @andrwng in PR #29515
- Cloud Topics: Admin RPCs GetEpochInfo and AdvanceEpoch by @oleiman in PR #29515
- Cloud Topics: Epoch advance frontend by @oleiman in PR #29536
- Cloud Topics: Metrics for L0 GC by @oleimanin PR #29556
- Cloud Topics: Level Zero GC force reset (escape hatch) by @oleiman in PR #29774
- Cloud Topics: L0 GC Safety Monitor by @oleimangroup principal support by @michael-redpanda in PR #28906
- group role authz by @michael-redpanda in PR #29259
- Consider group membership for authz checks by @michael-redpanda in PR #29157
- admin: add group role members to v2 RBAC API by @nguyen-andrew in PR #29236
- Process group claims from OIDC tokens by @michael-redpanda in PR #29000
- security/role: Add group role member type by @nguyen-andrew in PR #29217
- Support Group ACLs in Schema Registry by @michael-redpanda in PR #29476
rpk security group assign <group> --role <role>by @graham-rp in #29738rpk security group describe <group>by @graham-rp in #29738rpk security group listby @graham-rp in #29738rpk security group unassign <group> --role <role>by @graham-rp in #29738- User-based client quotas. by @IoannisRP in #29425
- Allow leadership pinning to racks in priority order by @joe-redpanda in #29366
- Add
--schema-contextflag torpk registryto scope all registry operations to a specific context by @c-julin in #29689 - Add
rpk registry context listandrpk registry context deletecommands for managing schema registry contexts by @c-julin in #29689 - schema_registry: Support storing and retrieving Metadata Properties alongside the Schema. by @BenPope in #28845
- rpk now supports Schema Registry metadata properties. by @r-vasquez in #29431
- Adds support for SASL/PLAIN authentication for Shadow Linking by @michael-redpanda in #28708
- Iceberg: Support customizing Iceberg catalog namespace (defaults to
redpanda) at cluster level. by @nvartolomei in #29113 - Iceberg: Add
$refkeyword support to JSON Schema integration. Limited to internal references and known keywords for now. by @nvartolomei in #29272 - Support JSON Schema additionalProperties with object sub-schemas, mapping them to Iceberg map types with string keys. by @nvartolomei in #29735
- Iceberg: JSON Schema to Iceberg conversion now supports
oneOffor expressing nullable fields (oneOf: [T, null]). by @nvartolomei in #29590 - Remote Read Replicas now support cross-region access by specifying region and endpoint overrides in the read replica topic's bucket property (e.g.
bucket-name?region=us-east-1&endpoint=s3.us-east-1.amazonaws.com), enabling RRR to read from buckets in different regions on AWS S3, and other S3 compatible backends. - Allow nodes to automatically decommission after a certain timeout by @joe-redpanda in #28946
- Adds native support for buf's protovalidate to Schema Registry by @michael-redpanda in #29404
- Coordinate transaction end markers removal by @bashtanov in PR #29043
- rpk cluster health: high_disk_usage_nodes is now reported. by @r-vasquez in #29480
- adds the
cloud_storage_prefetch_segments_maxcluster config which can be used to enable small segment prefetching in cloud storage. by @ballard26 in #29496 - #29347 Added
delete_topic_enablecluster configuration property to globally enable or disable topic deletion via the Kafka API. When set tofalse, all topic deletion requests are rejected with error codeTOPIC_DELETION_DISABLED(73). Default istruefor backward compatibility. This setting works independently ofkafka_nodelete_topics, which continues to protect specific topics regardless of this setting. by @michael-redpanda in #29365 - Added OIDC security to Admin API v2 with identity resolution, cluster-wide key refresh, and session revocation capabilities. Administrators can now manage OIDC authentication security through the new v2 endpoints with proper cluster coordination and error handling. by @nguyen-andrew in #28973
- Added SCRAM credential management in Admin API v2, including create, list, get, update, and delete SCRAM credentials. by @nguyen-andrew in #29174
- Added role management in Admin API v2, including create, list, get, and delete roles, and manage role members. by @nguyen-andrew in #29072
rpk shadownow has visual feedback on long-running operations. by @r-vasquez in #29060- rpk: supports now RPK_PROFILE environment variable to select your current profile. by @r-vasquez in #29195
- rpk: now have --ignore-profile that makes rpk ignore the rpk.yaml and redpanda.yaml config files. by @r-vasquez in #29215
- rpk: rpk can now be used to create Redpanda roles in Redpanda Cloud using the
rpk security rolecommand. by @r-vasquez in #29162 - Rpk can now generate a dashboard for monitoring Redpanda Serverless clusters via
rpk generate grafana-dashboard --dashboard serverless. by @nicolaferraro in #28858
Bug Fixes
- Addresses an issue if the OIDC service fails to communicate with the IdP causing the prevention of controller leadership election by @michael-redpanda in #28608
- Adds better validation to arrays and strings being read off the wire from
kafkaclients. by @WillemKauf in #29208 - Attempt to fail over a topic that isn't being Shadowed now returns a
404rather than a409by @michael-redpanda in #28737 - Avoids runaway replicators after topics got deleted. Additionally improve some observability related to shadow requests. by @bharathv in #28825
- Compatibility lookups on non-existent subjects can now fall through to context-level resolution with
defaultToGlobal=trueinstead of always returning an error. by @nguyen-andrew in #29713 - Do not wait for EOF from OIDC server on TLS connection shutdown. by @pgellert in #28425
- Fix a cache collision in in-memory spillover manifest cache which could lead to new topic reading data from the old topic with the same name for a short period of time. by @nvartolomei in #29068
- Fix a rare bug where adjacent segment merging in tiered storage fails with
Candidate creation error: candidate creation error: failed to seek end offsetwhich could result in degraded tiered storage read performance. by @nvartolomei in #28602 - Fix an issue where an unhandled exception in archival could lead to an input_stream asserting in its destructor. by @oleiman in #28886
- Fixed a bug where if a user updates a Shadow Link with the same password the password set at timestamp was cleared. Now if they use the same password, the timestamp is kept at its previous value by @michael-redpanda in #28855
- Fixed a rare bug that causes timequeries against partitions using tiered storage to incorrectly return no result when the partition's local log is empty but retains an active segment. by @wdberkeley in #28642
- Fixed admin API service restart to handle pandaproxy and schema_registry on the correct shard. by @pgellert in #29519
- Fixed an issue where Kafka fetch session responses could omit partitions with changed metadata (high watermark, log start offset, last stable offset) when the fetch retried internally due to min_bytes not being satisfied. by @pgellert in #29304
- Fixed buggy race detection logic in archival segment collection that could miss races between collection and compaction. by @oleiman in #29721
- Fixed cluster link replication attempting prefix truncation on partitions that do not support it (e.g., cloud topics) by @michael-redpanda in #29274
- Fixed file cleanup issue in
datalake::record_multiplexer. by @ballard26 in #29249 - Fixed shadow link replicating all historical data when
start_at_timestampis set to a timestamp beyond the end of the source partition log. The link now falls back to the last stable offset and replicates only new data. by @nguyen-andrew in #29861 - Fixes a bug in XML parsing for S3 and ABS clients which could lead to partial reads of XML responses by @WillemKauf in #28552
- Fixes a bug in which
tristatetopic properties would be considered disabled if set with the value0during aCreateTopicorAlterConfigrequest. Now, topic properties set with0will reflect that value exactly. This affects the following topic properties:segment.ms,retention.bytes,retention.ms,retention.local.target.bytes,retention.local.target.ms,initial.retention.local.target.bytes,initial.retention.local.target.ms,delete.retention.ms,min.cleanable.dirty.ratio. by @WillemKauf in #28811 - Fixes a bug in which a
segmentcould be incorrectly marked as having finished self compaction during a race with alogshutdown by @WillemKauf in #28730 - Fixes a bug where truncation does not propagate to sink if source start offset has moved ahead of sink high watermark. by @bharathv in #29358
- Fixes a long-standing bug in which compaction in the presence of leadership transfers/restarts could punch a hole in idempotency. Compaction is now disallowed from compacting the last batch for an idempotent producer. by @WillemKauf in #29001
- Fixes a possible cause of offset translator inconsistency arising from incorrect truncation of the log. by @bharathv in #29849
- Fixes a race in http client during shutdown by @rockwotj in #28438
- Fixes an issue where kafka clients that don't set a max_wait_ms (such as the schema registry client) would timeout immediately and not return any data. by @rockwotj in #29708
- Fixes duplicate message writes in Iceberg when disk or memory reservation errors occur. by @ballard26 in #29251
- Fixes incorrect reporting of partition movement progress in decommission/reconfiguration status output. by @bharathv in #29047
- Iceberg: Allow bucket names with dots. Also, reject URIs returned by catalog that look like
s3://bucket.s3.amazonaws.com/path/to/object. by @nvartolomei in #29071 - Iceberg: Fix incorrect duplicate keyword detection in JSON Schema that only caught adjacent duplicates. by @nvartolomei in #29592
- Prevent a slew of races between operations in the
storagelayer that could lead to attempting to append to a closed segment or a failed assert by @WillemKauf in #28472 - Prevent a very narrow race condition in the
partition_balancer_backend. by @WillemKauf in #28460 - Reduces a window in which Redpanda may end up missing data when using the
object_storagecatalog for Iceberg Topics. Reminder: theobject_storageis generally unsafe and should not be used in production. by @andrwng in #28414 - Repair tiered-storage manifests with misaligned archive offsets during archiver startup. This unblocks retention when such a manifest was present. by @nvartolomei in #29831
- Resolve deadlock in force_reconfigure edge case by @joe-redpanda in #29386
- Schema Registry: Fixes a bug in Avro schema compilation where external references were not properly resolved from the schema store, causing compilation failures, compatibility check failures or server-side schema id validation failures for schemas with reference dependencies. by @pgellert in #28780
- Updated C-Ares to 1.34.6 to address CVE-2025-62408 by @michael-redpanda in #29036
- Upgrade libxml2 to v2.15.2 to fix CVE-2026-0990 (Uncontrolled Recursion via xmlCatalogXMLResolveURI() in XML catalog processing). by @tyson-redpanda in #29788
- Variety of fixes to reduce noise in log concerning internal kafka client authentication and authorization by @michael-redpanda in #28852
- #29189 Fix the
partition_responsefor a produce request to aLogAppendTimetopic to accurately reflect thetimestamp_type. by @WillemKauf in #29286 - deflake ManyTopicsTest.test_decommission_node_unsafely by @joe-redpanda in #28518
- deflake offset for leader epoch by @joe-redpanda in #28389
- firm up start offset update logic by @joe-redpanda in #28309
- fix dangling reference in many partitions edge case by @joe-redpanda in #29909
- guarantee invalid lso gets converted to a retryable error when relevant by @joe-redpanda in #29112
- makes CFR cleanup manual to avoid nodewise recovery race by @joe-redpanda in #28307
- prevents state machine from loading local snapshot with offsets greater than log dirty offset by @mmaslankaprv in #28890
- pulls in seastar fix for ossl by @joe-redpanda in #28503
- schema_registry/swagger: Fix type for
versionin GET/compatibility/subjects/{subject}/versions/{version}by @BenPope in #28297
Improvements
-
rpk profile edit now documents each available field for an rpk profile by @r-vasquez in #29309
-
rpk shadow config generate: now supports --print-template along with --for-cloud by @r-vasquez in #29098
-
Snapshot id correlated with table update transaction is now available in Datalake REST API by @mmaslankaprv in #29180
-
Cluster health report now includes node IDs of nodes that exceed the disk usage reporting thresholds. by @bharathv in #29136
-
rpk: WCR supports
--cluster-uuid-overrideby @daisukebe in #29153 -
Add
rpk profile validateto check for your profile correctness. by @r-vasquez in #29357 -
Add batch delete support for ABS cloud storage clients by @oleiman in #29143
-
Add batch delete support for GCS cloud storage clients by @oleiman in #29246
-
Added IO queue configuration metrics exposing iotune rates per Seastar IO queue by @travisdowns in #29893
-
Added
host_metrics_infogauge with labels describing device resolution and configuration state by @travisdowns in #29893 -
Adds automatic retry with backoff for transient HTTP connection errors in Cloud API and S3 package download calls, reducing intermittent CDT test failures caused by RemoteDisconnected errors. by @cjayani in #29238
-
Adds commons sense validation to intervals relevant to partition balancing by @joe-redpanda in #29282
-
Aliases the following enum variants: by @michael-redpanda in #28807
ACLPattern.ACL_PATTERN_PREFIXtoACLPattern.ACL_PATTERN_PREFIXEDPatternType.PATTERN_TYPE_PREFIXEDtoPatternType.PATTERN_TYPE_PREFIX
-
Archive garbage collection now deletes segments in bounded batches (controlled by
cloud_storage_gc_max_segments_per_run, default300) instead of attempting to delete the entire backlog in a single housekeeping run. This prevents unbounded object storage delete requests on partitions with large expired archives, ensures each run commits incremental progress, and automatically schedules fast follow-up runs until the backlog is fully drained. by @nvartolomei in #29870 -
Better prioritize compaction of
__consumer_offsetspartitions when heavy-weight compactions are present on a broker. by @WillemKauf in #29382 -
Bump ListOffsets API support to v6 by @michael-redpanda in #28722
-
Calling
ListBrokersin the Admin v2 API doesn't require all brokers to be up. by @rockwotj in #29133 -
Connection errors now report the resolved address where possible. by @BenPope in #29201
-
Fixes a vulnerability in aioboto3 by pinning the urllib3 dependency. by @tyson-redpanda in #28907
-
Iceberg manifest serialization now handles all data_file fields from the v2 spec, ensuring full compatibility/no optional metadata loss during
merge_append_actionmanifest rewriting. by @nvartolomei in #29680 -
Improve cluster linking shadow topic prefix truncation error message. by @pgellert in #29289
-
Improve performance of Shadow Link when scaling to thousands of topics but making internal data structures copy on write by @michael-redpanda in #28956
-
Improve timeout logic for failover link to not timeout when processing thousands of shadow topics by @michael-redpanda in #28986
-
Improved default TLS Cipher Suites - Remove RSA key-exchange, and CBC, CCM by @BenPope in #27912
-
Increase frequency of topic reconciliation loop to reduce time it takes to failover topics by @michael-redpanda in #28986
-
Leader balancer improved to issue less moves to reach balanced state by @bashtanov in #29566
-
Low-level metrics for the segment appender are not available, such as bytes requested/written, IOs issued, etc. This lets us calculate various types of write amplification at the appender level and otherwise monitor the appender behavior. by @travisdowns in #29395
-
Minor improvement to speed up feature activation after major version upgrades. by @pgellert in #29824
-
Minor improvements to http client for better connection reuse in certain edge cases. by @nvartolomei in #28788
-
On rpk generate app, set
goas the default language. by @paulohtb6 in #28246 -
Partition balancer reallocation failures now includes moves which lost source quorum by @joe-redpanda in #28751
-
Reduce log noise by @michael-redpanda in #28986
-
Reduces oscillating behavior in partition balancing by @joe-redpanda in #29241
-
Remove a copy of a potentially large datastructure when starting a transaction by @rockwotj in #29834
-
Reply HTTP 400 not 500 to admin endpoints when the underlying error is
cluster::errc::topic_already_exists. It deflakes data migration tests, and is consistent with existing HTTP 400 oncluster::errc::topic_not_exists. by @bashtanov in #29630 -
Resolve deadlock in force_reconfigure edge case by @joe-redpanda in #29279
-
SCRAM credentials now track
password_set_attimestamps, exposed via admin API v2. This enables clients like the Kubernetes operator to verify credential state changes have propagated and perform accurate reconciliation. by @nguyen-andrew in #29328 -
Schema Registry: Optimise compatibility checking for json to allow more nested schemas. by @BenPope in #29290
-
Scoped host diskstat metrics to the data and cache directory devices, reporting partition-level and whole-disk I/O separately by @travisdowns in #29893
-
Shadow Linking: allow creating shadow links with TLS settings where the CA/key/cert strings exceed 128KiB. by @pgellert in #29718
-
The message printed by
rpk config exportis now written to stderr instead of stdout, preventing it from interfering with the exported configuration when stdout is used as the output destination. i.e.rpk config export --all -f /dev/stdoutby @simon0191 in #28760 -
The restart is no longer required if the cloud storage auth token is invalidated. by @Lazin in #28556
-
Updated default console image version in rpk container commands to v3.3.0 by @vbotbuildovich in #28635
-
Updated default console image version in rpk container commands to v3.3.2 by @vbotbuildovich in #28779
-
Updated default console image version in rpk container commands to v3.4.0 by @vbotbuildovich in #29151
-
Updated default console image version in rpk container commands to v3.5.1 by @vbotbuildovich in #29473
-
Upgrade libxml2 to latest minor version by @tyson-redpanda in #29423
-
#15206
rpk redpanda admin config log-level set: Added --help-loggers flag to display the list of available loggers without setting any log level. by @r-vasquez in #29601 -
#15206
rpk redpanda admin config log-level set: now dynamically discovers available loggers from the local Redpanda binary or the Admin API. by @r-vasquez in #29601 -
fix shard placement table race condition by @joe-redpanda in #28575
-
kafka/server: Fix a memory leak when the keytab cannot be found. by @BenPope in #28447
-
kafka/server: Fix a memory leak when the keytab cannot be found. by @BenPope in #28469
-
kafka/server: Fix rare uncaught exceptional future on connection shutdown by @BenPope in #28445
-
none by @travisdowns in #29560
-
partitions with an in-flight move and original replica quorum loss will now be reported as immutable by @joe-redpanda in #28393
-
potentially better behavior when working with some hardware RAID controllers and portworx by @mmaslankaprv in #28249
-
prevent partition balancer oscillations by @joe-redpanda in #29554
-
quality of life logging for shard_placement_table_test by @joe-redpanda in #28697
-
reduce test failures on data_migrations_api_test by @joe-redpanda in #28325
-
rpk cluster config set: Poll operation status before returning by @paulzhang97 in #28553
The rpk cluster config set command for Redpanda Cloud clusters now polls for operation completion and displays real-time progress, eliminating the need to manually check status withrpk cluster config statusin most cases.
What changed:- The command now waits up to 10 seconds (configurable via --timeout) for configuration updates to complete
- Progress indication shows elapsed time: Processing configuration... (3s elapsed)
- Clear success/failure messages upon completion
- For operations taking longer than the timeout, the command displays the operation ID for manual status checking
Example: $ rpk cluster config set kafka_connections_max_per_ip 12345 Processing configuration... (3s elapsed) Configuration update completed successfully. Operation ID: abc123
New flag:
--timeout: Maximum time to wait for operation completion before displaying the operation ID (default: 10s)
This improvement reduces the need to manually check operation status for quick configuration changes while maintaining backward compatibility for longer-running operations.
-
rpk profile edit, rpk profile edit-globals, rpk shadow update now report when an updated file contains unknown fields. by @r-vasquez in #29245
-
rpk profile prompt: add 'raw' as a modifier option to our prompt parsing. by @r-vasquez in #29219
-
rpk redpanda admin brokers decommission:
--forceflag has been deprecated and renamed to--skip-liveness-check. by @r-vasquez in #29369 -
rpk shadow create [cloud]: now rpk validates if the secret exists in the Shadow cluster before sending a request to create by @r-vasquez in #29077
-
rpk shadow describe now shows Schema Registry sync mode. by @r-vasquez in #29023
-
rpk shadow describe now supports the --format json/yaml flag. by @r-vasquez in #29111
-
rpk shadow is now supported in Redpanda Cloud by @r-vasquez in #28748
-
rpk will warn users when they interact with a cloud cluster using an rpk profile that isn't properly configured for cloud. by @r-vasquez in #28594
-
rpk: add --if-not-exists flag to topic create by @nvartolomei in #28193
-
rpk: consume warns when requested end offset is lower than available start offset by @daisukebe in #28979