librdkafka v2.4.0 is a feature release:
- KIP-848: The Next Generation of the Consumer Rebalance Protocol.
Early Access: This should be used only for evaluation and must not be used in production. Features and contract of this KIP might change in future (#4610). - KIP-467: Augment ProduceResponse error messaging for specific culprit records (#4583).
- KIP-516
Continue partial implementation by adding a metadata cache by topic id
and updating the topic id corresponding to the partition name (#4676) - Upgrade OpenSSL to v3.0.12 (while building from source) with various security fixes,
check the release notes. - Integration tests can be started in KRaft mode and run against any
GitHub Kafka branch other than the released versions. - Fix pipeline inclusion of static binaries (#4666)
- Fix to main loop timeout calculation leading to a tight loop for a
max period of 1 ms (#4671). - Fixed a bug causing duplicate message consumption from a stale
fetch start offset in some particular cases (#4636) - Fix to metadata cache expiration on full metadata refresh (#4677).
- Fix for a wrong error returned on full metadata refresh before joining
a consumer group (#4678). - Fix to metadata refresh interruption (#4679).
- Fix for an undesired partition migration with stale leader epoch (#4680).
- Fix hang in cooperative consumer mode if an assignment is processed
while closing the consumer (#4528).
Upgrade considerations
- With KIP 467,
INVALID_MSG
(Java: CorruptRecordExpection) will
be retried automatically.INVALID_RECORD
(Java: InvalidRecordException) instead
is not retriable and will be set only to the records that caused the
error. Rest of records in the batch will fail with the new error code
_INVALID_DIFFERENT_RECORD
(Java: KafkaException) and can be retried manually,
depending on the application logic (#4583).
Early Access
KIP-848: The Next Generation of the Consumer Rebalance Protocol
-
With this new protocol the role of the Group Leader (a member) is removed and
the assignment is calculated by the Group Coordinator (a broker) and sent
to each member through heartbeats.The feature is still not production-ready.
It's possible to try it in a non-production enviroment.A guide is available
with considerations and steps to follow to test it (#4610).
Fixes
General fixes
- Issues: confluentinc/confluent-kafka-go#981.
In librdkafka release pipeline a static build containing libsasl2
could be chosen instead of the alternative one without it.
That caused the libsasl2 dependency to be required in confluent-kafka-go
v2.1.0-linux-musl-arm64 and v2.3.0-linux-musl-arm64.
Solved by correctly excluding the binary configured with that library,
when targeting a static build.
Happening since v2.0.2, with specified platforms,
when using static binaries (#4666). - Issues: #4684.
When the main thread loop was awakened less than 1 ms
before the expiration of a timeout, it was serving with a zero timeout,
leading to increased CPU usage until the timeout was reached.
Happening since 1.x. - Issues: #4685.
Metadata cache was cleared on full metadata refresh, leading to unnecessary
refreshes and occasionalUNKNOWN_TOPIC_OR_PART
errors. Solved by updating
cache for existing or hinted entries instead of clearing them.
Happening since 2.1.0 (#4677). - Issues: #4589.
A metadata call before member joins consumer group,
could lead to anUNKNOWN_TOPIC_OR_PART
error. Solved by updating
the consumer group following a metadata refresh only in safe states.
Happening since 2.1.0 (#4678). - Issues: #4577.
Metadata refreshes without partition leader change could lead to a loop of
metadata calls at fixed intervals. Solved by stopping metadata refresh when
all existing metadata is non-stale. Happening since 2.3.0 (#4679). - Issues: #4687.
A partition migration could happen, using stale metadata, when the partition
was undergoing a validation and being retried because of an error.
Solved by doing a partition migration only with a non-stale leader epoch.
Happening since 2.1.0 (#4680).
Consumer fixes
- Issues: #4686.
In case of subscription change with a consumer using the cooperative assignor
it could resume fetching from a previous position.
That could also happen if resuming a partition that wasn't paused.
Fixed by ensuring that a resume operation is completely a no-op when
the partition isn't paused.
Happening since 1.x (#4636). - Issues: #4527.
While using the cooperative assignor, given an assignment is received while closing the consumer
it's possible that it gets stuck in stateWAIT_ASSIGN_CALL
, while the method is converted to
a full unassign. Solved by changing state fromWAIT_ASSIGN_CALL
toWAIT_UNASSIGN_CALL
while doing this conversion.
Happening since 1.x (#4528).
Checksums
Release asset checksums:
- v2.4.0.zip SHA256
24b30d394fc6ce5535eaa3c356ed9ed9ae4a6c9b4fc9159c322a776786d5dd15
- v2.4.0.tar.gz SHA256
d645e47d961db47f1ead29652606a502bdd2a880c85c1e060e94eea040f1a19a