github rabbitmq/rabbitmq-server v4.2.0
RabbitMQ 4.2.0

one day ago

RabbitMQ 4.2.0 is a new feature release.

Breaking Changes and Compatibility Notes

Default value for AMQP 1.0 durable field.

Starting with RabbitMQ 4.2, if a sending client omits the header section, RabbitMQ assumes the durable field to be false complying with the AMQP 1.0 spec:

<field name="durable" type="boolean" default="false"/>

AMQP 1.0 apps or client libraries must set the durable field of the header section to true to mark the message as durable.

Team RabbitMQ recommends client libraries to send messages as durable by default.
All AMQP 1.0 client libraries maintained by Team RabbitMQ send messages as durable by default.

Mandatory flag in Direct Reply-To

Starting with RabbitMQ 4.2, if an AMQP 0.9.1 Direct Reply-To responder (RPC server) publishes with the mandatory flag set, then amq.rabbitmq.reply-to.* is treated as a queue.
Whether the requester (RPC client) is still there to consume the reply is not checked at routing time.
In other words, if the responder publishes to only this queue name, then the message will be considered "routed" and RabbitMQ will therefore not send a basic.return.

Very Rarely Used *.cacerts Settings are Removed from rabbitmq.conf

*.cacerts (not to be confused with cacertfile) settings in rabbitmq.conf did not have the expected effect and were removed
to eliminate confusion.

Quorum Queue Metric Changes

Metrics emitted for Ra-based components (quorum queues, Khepri, Stream Coordinator)
have changed. Some metrics were removed, many were added, some changed their names.
Users relying on Prometheus metrics starting with rabbitmq_raft or rabbitmq_detailed_raft
will need to update their dashboards and/or alerts. If you are using the
RabbitMQ-Quorum-Queues-Raft dashboard,
please update it to the latest version for RabbitMQ 4.2 compatibility.

Release Highlights

SQL Filter Expression for Streams

AMQP 1.0 clients can now define SQL-like filter expressions when consuming from streams, enabling server-side message filtering.
RabbitMQ will only dispatch messages that match the provided filter expression, reducing network traffic and client-side processing overhead.
SQL filter expressions are a more powerful alternative to the AMQP Property Filter Expressions introduced in RabbitMQ 4.1.

RabbitMQ implements a subset of AMQP Filter Expressions Version 1.0 Committee Specification Draft 01 Section 6 including support for:

  • Comparison operators (=, !=, <>, >, <, >=, <=)
  • Logical operators (AND, OR, NOT)
  • Arithmetic operators (+, -, *, /, %)
  • Special operators (LIKE, IN, IS NULL)
  • UTC function
  • Access to the properties and application-properties sections

Examples

Simple expression:

header.priority > 4

Complex expression:

order_type IN ('premium', 'express') AND
(customer_region LIKE 'EU-%' OR customer_region = 'US-CA') AND
UTC() < properties.absolute-expiry-time AND
NOT cancelled

To learn more, check out the new documentation guide on Stream Filtering.

Pull Request: #14184

Direct Reply-To for AMQP 1.0

RabbitMQ 4.2 adds Direct Reply-To support for AMQP 1.0, alongside the existing AMQP 0.9.1 implementation.
It also works across protocols (e.g., AMQP 1.0 requester with AMQP 0.9.1 responder, or vice versa).

For more information, read our updated documentation on Direct Reply-To.

Pull Request: #14474

New Tooling for More Automated Blue-Green Deployment Migrations from 3.13.x Clusters to 4.2.x

Blue-Green Deployment migration from RabbitMQ 3.13.x
to 4.2.0 is now easier to automate thanks to a new set of commands provided by rabbitmqadmin v2.

Incoming and Outgoing Message Interceptors for Native Protocols (AMQP 1.0, AMQP 0-9-1, MQTTv3, MQTTv5)

Incoming and outgoing messages can now be intercepted on the broker.
This works for AMQP 1.0, AMQP 0.9.1, MQTTv3, and MQTTv5.

What the interceptor does is entirely up to its implementation, for example it can validate message metadata, add annotations, or perform arbitrary side effects.
Custom interceptors can be developed and integrated via plugins.

Two new optional built-in interceptors were added to RabbitMQ:

  1. Timestamps for outgoing messages
  2. Setting client ID of publishing MQTT client

Detailed information can be found in the Message Interceptor documentation.

Khepri Enabled by Default for New Clusters

RabbitMQ supports two databases to store
metadata
such as virtual hosts,
topology, runtime parameters, policies, internal users and so on: Mnesia and
Khepri. That metadata store is also at the heart of clustering in RabbitMQ. As
of RabbitMQ 4.2.0, Khepri is the default metadata store for new
deployments
.

Khepri is based on the
same Raft consensus algorithm used by quorum queues and streams. The goal is to
have a consistent well defined behaviour around all queries and updates of
metadata across an entire cluster, especially when the cluster suffers
increased latency or network issues for instance. It also comes with increased
performance in several use cases, even though this was not a goal.

A new RabbitMQ 4.2.0+ node will use Khepri by default. If you upgrade an
existing node or cluster, it will continue to use whatever metadata store it
was using so far.

If you did not enable Khepri yet, it is recommended that you enable it:

rabbitmqctl enable_feature_flag khepri_db

Khepri will become mandatory in a future minor version. Mnesia support will be
dropped in a future major version. These exact versions are to be decided.

Local Shovels

In addition to AMQP 0-9-1 and AMQP 1.0, Shovels
now support a new "protocol" option called local.

These specialized shovels are internally based on AMQP 1.0 but instead of
separate TCP connections, use the intra-cluster connections
between cluster nodes and the internal API for consumption, publishing
and AMQP 1.0 credit flow.

Such shovels can only be used for consuming and publishing
within the same cluster, not across clusters, but can offer
higher throughput and use fewer resources per connections
than their AMQP 0-9-1 and AMQP 1.0 counterparts.

Upgrading to 4.2.0

Documentation guides on upgrades

See the Upgrading guide for documentation on upgrades and GitHub releases
for release notes of individual releases.

This release series supports upgrades from 4.1.x, 4.0.x and 3.13.x.

If upgrading from a 3.13.x cluster that uses classic mirrored queues,
take a look at what modern CLI tools can offer for such migrations away from classic mirrored queues
via Blue/Green deployments.

Blue/Green Deployment-style upgrades are available for migrations
from RabbitMQ 3.12.x series.

New Required Feature Flags

None. The required feature flag set is the same as in 4.1.x and 4.0.x.

Mixed version cluster compatibility

RabbitMQ 4.2.0 nodes can run alongside 4.1.x and 4.0.x nodes. 4.2.x-specific features can only be made available when all nodes in the cluster upgrade to 4.2.0 or a later patch release in the new series.

While operating in mixed version mode, some aspects of the system may not behave as expected.
Once all nodes are upgraded to 4.1.0, these irregularities will go away.

Mixed version clusters are a mechanism that allows rolling upgrade and are not meant to be run for extended
periods of time (no more than a few hours).

Recommended Post-upgrade Procedures

This version does not require any additional post-upgrade procedures
compared to other versions.

Changes Worth Mentioning

Core Server

Enhancements

  • In clusters with a larger number of quorum queues (say, tens of thousands),
    quorum queue leadership transfer is now performed gradually and not all at once.

    Previously tens of thousands of concurrent leader elections
    could result in timeouts and some quorum queues ending up
    without an elected leader.

    GitHub issue: #14401

  • Schema data store (Khepri) read concurrency optimizations that can lead to low doublt digit percent
    throughput gains on nodes with larger numbers of cores.

    GitHub issue: #14530

  • Two new rabbitmq.conf, settings log.summarize_process_state and log.error_logger_format_depth, can be used
    to significantly reduce the amount of queue member (replica) state logged in case of an abnormal termination.

    Limiting logging helps avoid memory allocation spikes.

    GitHub issues: #14349, #14523

  • When a configured authentication or authorization backend comes from a known
    plugin but the plugin is not enabled, the node will now refuse to start.

    Previously the node would boot but client connections would fail because
    of the missing backend modules.

    GitHub issues: #13783, #14408

  • Similarly to the number of queues and virtual hosts,
    it is now possible to configure a limit on the cluster-wide number of exchanges that applications
    can create:

    # Applications won't be able to declare more than 200 exchanges
    # (including the protocol-standard pre-declared ones) in the cluster
    cluster_exchange_limit = 200
  • Routing via the fanout exchange got optimised.
    For messages published to the fanout exchange, end-to-end message throughput increases by up to 42%.

    GitHub issue: #14546

  • It is now possible to disable specific queue types.

    Clients won't be able to declare new queues or streams of the disabled types.

    GitHub issue: #14624

  • Users of 3rd party plugins now can use a dedicated directory that won't be removed
    during Mnesia to Khepri upgrades. Previously all non-whitelisted directories in the node's data directory
    would be deleted together with other Mnesia data (Khepri data is stored separately) at the end of such a migration.

    GitHub issue: #11304

Bug Fixes

  • Classic queues could run into a rare message store exception that resuulted in
    a loss of a few messages.

    Special kudos to the contributors who have spent a very significant amount of time
    reproducing and debugging the issue: @lhoguin @lukebakken @trvrnrth @gomoripeti

    GitHub issues: #14181, #14576

  • Messages routed to quorum queues during or immediately before a network partition,
    in some cases were not re-republished internally in certain cases.

    GitHub issue: #14589

  • Quorum queues with disabled poison message handling
    (an unlimited number of redeliveries, which is not a recommended practice) could accumulate
    a significant number of Raft log segment files.

    GitHub issues: #14202, #14458

  • Certain periodic quorum queue operations now perform metadata store updates in a more defensive way
    in a network partition scenario.

    GitHub issue: #14672

  • default_password, ssl_options.password now can tell between a generated random password
    value and an encrypted value better.

    Encrypted values must be prefixed with encrypted:. All other values, including
    generated passwords that contain a colon (:), will be considered non-encrypted ones.

    GitHub issue: #14365

  • Import of definition files that contained topic exchange permissions failed.

    GitHub issue: #14409

  • *.cacerts (not to be confused with cacertfile) settings in rabbitmq.conf did not have the expected effect and were removed
    to eliminate confusion.

    This is a potentially breaking change.

    GitHub issue: #14655

  • Enabling the khepri_db feature flag while the Log Exchange
    was enabled could cause a RabbitMQ node to run out of memory and crash.

    GitHub issues: #14069, #14796

Stream Plugin

Enhancements

  • Consuming from a stream now uses fewer system calls and therefore is more efficient.

    GitHub issues: #14189, rabbitmq/osiris#192

Bug Fixes

  • Stream client connections that authenticate using a JWT token (OAuth 2) have
    to periodically renew their JWT tokens. Should such an update fail,
    the RabbitMQ Stream Protocol connection will be immediately closed.

    In addition, stream connections now verify that the newly obtained JWT
    token still grants access to the virtual host the client is connected to.

    GitHub issues: #14403, #14406

MQTT Plugin

Bug Fixes

  • Resource alarm handling now uses more context: it is aware of individual resources.
    When a cluster had multiple resource alarms (namely for memory footprint and free disk space)
    in effect, the blocking state was prematurely cleared when only one resource was [cleared].

    GitHub issue: #14795

CLI Tools

Bug Fixes

  • rabbitmqctl export_definitions could incorrectly serialize policy and operator policy
    definitions.

    GitHub issue: #14800

Enhancements

  • rabbitmq-diagnostics is a new command that provides a message size distribution.
    Use it to get an estimate of the size of the messages flowing through the cluster.

    GitHub issue: #14560

Management Plugin

Enhancements

  • Users now can be protected from deletion or modification over the HTTP API.
    To protect a user, tag it with protected:

    rabbitmqctl set_user_tags "a-user" "protected"

    To lift the protection, remove the tag using rabbitmqctl set_tags or delete the user via rabbitmqctl delete_user
    re-create it with a different set of tags.

    GitHub issues: #14282, #14545

Shovel Plugin

Enhancements

  • Shovels now support a new "protocol" called local.

    In addition to AMQP 0-9-1 and AMQP 1.0, Shovels
    now support a new "protocol" option called local.

    These specialized shovels are internally based on AMQP 1.0 but instead of
    separate TCP connections, use the intra-cluster connections
    between cluster nodes and the internal API for consumption, publishing
    and AMQP 1.0 credit flow.

    Such shovels can only be used for consuming and publishing
    within the same cluster, not across clusters, but can offer
    higher throughput and use fewer resources per connections
    than their AMQP 0-9-1 and AMQP 1.0 counterparts.

Bug Fixes

  • Direct AMQP 0-9-1 shovel connections within a cluster (not to be confused with local shovels introduced in this release)
    are now blocked by resource alarms just like their "network" counterparts are.

    GitHub issue: #14657

  • Shovels could not be deleted using rabbitmqctl in some cases.

    GitHub issue: #14623

  • The number of pending messages reported by shovels was not an integer in certain scenarios.

    GitHub issue: #14710

  • Prometheus metric collector failed with an exception when the scraper endpoint
    was hit when one or more shovels were still starting.

AWS Peer Discovery Plugin

Bug Fixes

  • The plugin implicitly depended on ordering of networkInterfaceSet and privateIpAddressesSet EC2 API response fields,
    which could result in obscure cluster formation issues.

    GitHub issue: #14557

STOMP Plugin

Bug Fixes

  • Resource alarm handling now uses more context: it is aware of individual resources.
    When a cluster had multiple resource alarms (namely for memory footprint and free disk space)
    in effect, the blocking state was prematurely cleared when only one resource was [cleared].

    GitHub issue: #14795

Web MQTT Plugin

Bug Fixes

  • Resource alarm handling now uses more context: it is aware of individual resources.
    When a cluster had multiple resource alarms (namely for memory footprint and free disk space)
    in effect, the blocking state was prematurely cleared when only one resource was [cleared].

    GitHub issue: #14795

Enhancements

  • HTTP/2 is enabled for WebSocket connections by default.

    GitHub issue: #14500

Web STOMP Plugin

Enhancements

  • HTTP/2 is enabled for WebSocket connections by default.

    GitHub issue: #14500

Dependency Changes

  • ra was upgraded to 2.17.1
  • osiris was upgraded to 1.10.0
  • khepri was upgraded to 0.17.2
  • khepri_mnesia_migration was upgraded to 0.8.0
  • cowboy was upgraded to 2.14.1
  • cuttlefish was upgraded to 3.5.0

Ra Metric Changes

Metrics emitted for Ra-based components (quorum queues, Khepri, Stream Coordinator)
have changed. Some metrics were removed, many were added, some changed their names.
For most users this should not require any action. However, users relying on Prometheus
metrics starting with rabbitmq_raft or rabbitmq_detailed_raft will need to update
their dashboards and/or alerts. If you are using the
RabbitMQ-Quorum-Queues-Raft dashboard,
please update it to the latest version for RabbitMQ 4.2 compatibility.

More Accurate and Detailed Ra Metrics

Ra is an internal component implementing the Raft protocol. It's the basis
for quorum queues, as well as some internal components (currently Khepri
and the Stream Coordinator). For quite some time, Ra metrics were tracked in two places
but RabbitMQ relied on the old metric subsystem. In RabbitMQ 4.2, the old
Ra metrics subsystem has been removed and RabbitMQ now reports Ra metrics
from the new subsystem (implemented using Seshat library).
This migration has the following benefits:

  • lower overhead, since only one subsystem is used
  • more up-to-date information - the old subsystem was only refreshed every 5 seconds,
    the new subsystem always returns the latest values
  • additional metrics are exposed, making it easier to debug the system if necessary

Aggregated metrics (/metrics endpoint)

  • rabbitmq_raft_num_segments was added; it reports the number of segment files of the internal components

  • rabbitmq_raft_max_num_segments was added; it reports the highest number of segment
    files of any of the quorum queues; per-object metrics can be used to find which queue
    has a high number of segment files

  • rabbitmq_raft_term_total has been removed
    this metric was emitted accidentally as a side effect of metric aggregation;
    the sum of Raft terms across all Raft clusters is a meaningless number

  • some metrics contained the _log_ substring in their name, even though they are not related to the Raft log;
    hence, they were renamed to avoid the misleading part:

    • rabbitmq_raft_log_snapshot_index -> rabbitmq_raft_snapshot_index
    • rabbitmq_raft_log_last_applied_index -> rabbitmq_raft_last_applied
    • rabbitmq_raft_log_commit_index -> rabbitmq_raft_commit_index
    • rabbitmq_raft_log_last_written_index -> rabbitmq_raft_last_written_index
  • rabbitmq_raft_entry_commit_latency_seconds has been removed; it was an average latency across all Ra clusters
    in all Ra systems (RabbitMQ currently uses two separate Ra systems: one for quorum queues and one for internal
    components, currently Khepri and Stream Coordinator); it was therefore not very useful, since different
    components can have very different latencies

  • rabbitmq_raft_commit_latency_seconds was added; in case of aggregated metrics, it is only reported for
    internal components (currently Khepri and Stream Coordinator)

  • rabbitmq_raft_max_commit_latency_seconds has been added; it's the highest commit latency reported by any
    of the quorum queues. When it's high, per-object can be used to find which specific queue reports high commit latency

Per-object metrics (/metrics/per-object endpoint)

More metrics are reported for each queue than in older versions.

Incorrect metric names were corrected as described above.

Additionally:

  • rabbitmq_raft_term_total has been renamed to rabbitmq_raft_term (the "total" suffix
    was incorrect and misleading, since the metrics is reported for each specific Ra cluster)

  • rabbitmq_raft_num_segments was added; it reports the number of segment files of the internal components
    and for each quorum queue

Detailed metrics (/metrics/detailed endpoint)

When the detailed endpoints is scraped with family=ra_metrics parameter,
more metrics are reported for each queue than in older versions.

Incorrect metric names were corrected as described above.

Additionally:

  • rabbitmq_raft_term_total has been renamed to rabbitmq_raft_term (the "total" suffix
    was incorrect and misleading, since the metrics is reported for each specific Ra cluster)

  • rabbitmq_raft_num_segments was added; it reports the number of segment files of the internal components
    and for each quorum queue

Source Code Archives

To obtain source code of the entire distribution, please download the archive named rabbitmq-server-4.2.0.tar.xz
instead of the source tarball produced by GitHub.

Don't miss a new rabbitmq-server release

NewReleases is sending notifications on new releases.