Important Update: Aeron is moving to a new GitHub organisation
Aeron is moving to a new GitHub organisation following its adoption by Adaptive in 2022. This transition marks a significant milestone in Aeron's journey, ensuring continued innovation and support for the world's leading low-latency message transport system.
You can find the new Aeron, SBE and Agrona repositories and all related resources at aeron-io.
All links to the previous repository location are automatically redirected to the new location.
However, to avoid confusion, we recommend updating any existing local clones to point to the new repository URL. You can do this by using git remote
on the command line:
git remote set-url origin NEW_URL
Thank you for your continued support and contributions to the Aeron Open Source project.
Breaking changes
-
[Java] Upgrade to Agrona 2.0.0.
Note:
--add-opens java.base/jdk.internal.misc=ALL-UNNAMED
JVM option must be specified in order to run Aeron. In addition to--add-opens java.base/java.util.zip=ALL-UNNAMED
that is required to running the Aeron Archive.
Noteworthy Changes
-
Detect and terminate dormant Archive clients.
Archive will now send periodic heartbeat messages to each connected Archive client. By default it is done once per second and can be configured via
aeron.archive.session.liveness.check.interval
property or programmatically viaio.aeron.archive.Archive.Context#sessionLivenessCheckIntervalNs(long)
method. If it detects that it cannot send such a message for more than a connection timeout (i.e.aeron.archive.connect.timeout
, defaults to 5 seconds) then it will close the corresponding control session which will cause such Archive client to disconnect. -
Eliminate interference between Archive clients.
-
C/C++ Wrapper implementation of the Archive client APIs.
In terms of feature completeness and stability, they are still marked experimental, as there's a small chance some of the functions might change as the feature is hardened. Furthermore, a number of the async APIs have yet to be implemented.
-
Fix duplicate service messages during failover/restart when using multiple services in Cluster.
When service messages are being sent from multiple services, these can be enqueued in different orders. This means during failover/restart pending messages can be skipped or duplicated when a new leader is elected.
Upgrade procedure: Those affected will need to do a clean shutdown (with a snapshot) and restart the whole cluster with the fix.
-
Invalidate Standby snapshots.
When invalidating latest snapshot both normal and Standby snapshots are taken into account. In order to prevent invalidated snapshots from being re-downloaded from the Standby node upon recovery.
-
New log events for NAK messages sent and received.
NAK_RECEIVED
logging event was added when a NAK request is received by the sender. An existingSEND_NAK_MESSAGE
event was renamed toNAK_SENT
and logs a NAK message being sent by the receiver. -
Prevent client process crashing by a pathologically slow consumer.
If a call to
Controlled/FragmentHandler#onFragment
blocks for disproportionate amount of time, i.e. long enough for anImage
to become unavailable. Then the corresponding log buffer will freed by the client conductor thread. Any further access to the log buffer will cause the client process to segfault. TheImage
was updated to prevent any further access once it was closed.
Changelog
- [Java] Speedup
purgeSegments/deleteDetachedSegments
operations by only deleting files in a range between the current startPosition and the previous startPosition (purge) or the oldest existing segment file position (detached files). - [C] Fix dangling pointer in replay merge. (#1723)
- [Java] Prevent segfaults through mark file API after close.
- [Java] Trigger slow build on push to master.
- [Java] Do not close Cluster archive when doing next rounds of backup queries since the replay might still be active. Also do not switch to
RESET_BACKUP
state unless the current Cluster node has switched its role and therefore is no longer eligible for replay. - [Java] Use ClusterEvent instead of ClusterException with Category.WARN.
- [Java] Use ClusterEvent to report issues when stopping recording/replay + prevent an NPE when stopping a replay as
clusterArchive
could have been closed while in the BACKUP_QUERY stage. - [C/C++] Change interval of driver keepalive error reporting.
- [C] Update C driver to use the same matching logic as the Java driver for checking the validity of tagged publications and subscriptions.
- [CI] Core dump dir creation.
- [CI] Enable core dumps on Linux and MacOS.
- [CI] Collect Windows core dump files.
- [CI] Trigger slow build on PR.
- [C] compare publication stream id with link stream id when checking for matching spy subscriptions (#1722)
- [CI] Add
ubuntu-24.04-arm
to the build matrix for Java. - [CI] Use env to store base Java version.
- [Java] Handle multiple PendingServiceMessageTrackers while producing consensus module patch.
- [CI] Simplify log upload.
- [CI] Fix crash log upload on Windows.
- Bug/fix error with tagged channels reresolution (#1720)
- [C] add a call to init the new fields in the logbuffer metadata (#1717)
- [Java] Add the few missing fields for logbuffer descriptor (#1721)
- [Java] Close temporary MarkFile when migrating from old version.
- [Java] Write message header before mark file header in the
cluster-mark.dat
file to be able to use SBE features based on theactingBlockLength
andactingVersion
. - tidy up the namespace for exception_handler_t (#1715)
- [C] Handle connecting to Archive without credentials. (#1716)
- [Java] Write message header before mark file header in the
archive-mark.dat
file to be able to use SBE features based on theactingBlockLength
andactingVersion
. - [Java] Fix config not found issue.
- [Java] Extract capturing lambda allocation to outside loop and yield when not making progress.
- Replacement of ThreadHints.onSpinWait by Thread. (#1713)
- [Java] Add
archiveId
to the ArchiveMarkFile. - [Java] Fix an off-by-one error while searching for counters.
- [Java] Tidy up after #1711.
- [Java] Add a test for address re-resolution back to the initial IP address.
- fix resolution bug, when the new ip is back to udpChannel.remoteData, this can not be triggered
resolution changes
(#1711) - [Java] Use different URI for early access JDK build.
- [Java] Fix Javadoc URI.
- [Java] Run Mockito as Java agent for JDK 23+ compatibility.
- [CI] Add JDK 23 to the build matrix.
- [Java] Change comment prevent JDK23 javadoc warning, FIX #1710.
- [C] Prevent double free of the aeron_exclusive_publication_t which is closed by the proxy.
- [Java] Poll for remote Archive errors while awaiting log recording session to be created.
- [Java] Add OS max/default values for SO_SNDBUF and SO_RCVBUF parameters to the log buffer metadata section.
- [C] Add OS default/max fields for the
OS_SNDBUF/OS_RCVBUF
to the log buffer metadata section. - Fixes problem with socket snd/rcv buffer in logbuffer metadata. (#1707)
- [Java] Add method to convert error code to String.
- [Java] Remove remaining dynamic join APIs.
- [Java] Move config printing option to the CommonContext.
- [Java] Cleanup after #1705.
- rename the null value to compatible with C++ (#1705)
- [Java] Touch ups.
- Logbuffer metadata extra fields (#1700)
- [C] Rename
SEND_NAK_MESSAGE
toNAK_SENT
so that it is symmetric withNAK_SENT
- [C] Add
AERON_DRIVER_EVENT_NAK_RECEIVED
event logging. - [Java] Rename
SEND_NAK_MESSAGE
toNAK_SENT
so that it is symmetric withNAK_SENT
. - [Java] Add log event for when a NAK message is received.
- Fix duplicate service messages during failover/restart when using multiple services (#1703)
- [Java] Change Tests.sleep so that it uses LockSupport.parkNanos to prevent catching of InterruptedException and clearing the interrupt flag.
- [C] Remove duplicate definition of aeron_semantic_version_compose. (#1701)
- [Java] Close AeronArchive client if control response Subscription is disconnected.
- [Build] correct release gradle cache path
- [Java] Rename IngressAdapter onFragment to onMessage and remove interface to provide more appropriate naming.
- [Build] Remove OSS c/c++ binary step in release workflow
- [Java] Simplify synchronous connect.
- [Java] Use Agrona Checksum classes.
- [Java] Emit WARN event when ControlSession is closed abruptly + add reason to the ControlSession state transition log + increase default stale session check interval to 1s.
- [Java] Fix
shouldRejoinAfterResting
test. - [Java] Add isConnected to Subscription.toString.
- [Java] Add more detail to AeronArchive exception when subscription is not connected.
- [Java] Add test utility for stubbing addition of counters.
- [C] Close uri after parsing.
- [C] Fix printing of the error message.
- [C] Add
stream-id
andpub-wnd
URI parameters. - [C] Add
AERON_ERROR_CODE_IMAGE_REJECTED
andAERON_ERROR_CODE_PUBLICATION_REVOKED
error codes. - [Java] Add
IMAGE_REJECT
andPUBLICATION_REVOKE
error codes. - [Java] Fix a race condition in
shouldRecordThenBoundReplayWithCounter
. - [C] Align logging for
aeron_driver_conductor_on_publication_error
with the Java implementation. - [Java] Use AeronEvent when logging onPublicationError and add additional parameters.
- [C++] Close broadcast_receiver to avoid leaking scratch buffer memory.
- [C] Make broadcast_receiver scratch buffer expandable to accommodate for large responses from the media driver.
- [C] Reset
aeron_uri_t
struct before parsing so that cleanup would not fail with a segfault. - [Java] Validate that channel URI does not exceed 4095 characters.
- [Java] Allow client buffer to accept responses larger than 4KB.
- [C] Validate channel URI length, i.e. it cannot exceed 4095 characters.
- [C] Fix buffer potential buffer overflow on error.
- [C] Add
AERON_URI_MAX_LENGTH
and extendAERON_MAX_PATH
to 4096 bytes + cleanup. - [CI] Add Clang 19.
- [Java] Rename extension property in AeronCluster.Context.
- [C++] Fix race conditions in Archive tests.
- [Java] Clarify Javadoc for
scheduleTime/cancelTimer
operations. - [Java] Remove
ControlSessionDemuxer
. - [C] Fix sender MTU validation.
- [Java] Fix error message for
SO_RCVBUF
validation. - [Java] Fix
logReplicationSessionStateChange
. - [Java] Validate that control publication is exclusive.
- [C] Align driver conductor duty cycle with the Java implementation.
- [C] Set not connected status immediately when network publication has no receivers.
- [Java] Set not connected status immediately when network publication has no receivers.
- [Java] invalidate standby snapshots inside
invalidateLatestSnapshot
(#1692) - [Java] Add ClusterMember to the ConsensusModuleControl.
- [Build] Remove draft release notes
- [Java] Improve checkstyle config for javadoc and apply changes.
- Update checkstyle config for types.
- Upgrade Checkstyle and BND.
- [Java] Do not fail on heartbeat response.
- [Java] Timeout
ControlSession
if AeronArchive is not being polled. - [Java] Image reject reason is stored as ASCII and shot be validated accordingly.
- [C] Update C++ test code to await Archive startup.
- [C] Use the same C code for the C++ wrapper tests.
- [C] Simplify Archive test setup.
- [C] Pass process handle variable into spawn/kill/await routines.
- [C] Await Archive startup + fix PID resolution on Windows.
- [C] More memory leak fixes.
- [C] Free idle strategy state if it is not explicitly assigned.
- [C] Close
aeron_uri_string_builder_t
to prevent a memory leak. - [C] Add unique
session-id
to both request and response channels to ensure that different Archive clients are not blocking each other if not being polled. This only applies if response channel does not specifycontrol-mode=response
. - [C] Fix
aeron_uri_string_builder
. - [C] Move helper parsing functions to the
aeron_parse_util.h
. - [C] Move
aeron_randomised_int32
implementation toaeron_bitutil.c
. - [C] Use signals to terminate Archive process.
- [Java] Make
RecordingLog.invalidateEntry
an O(1) operation and non-public. - [Java] Ensure that ENTRY_TYPE_SNAPSHOT always sorted after ENTRY_TYPE_STANDBY_SNAPSHOT and ENTRY_TYPE_TERM.
- [Java] Move CRC classes to Agrona.
- [Java] Surface the clusterId in the ConsensusModuleControl interface.
- [Java] Add convenience method for determining if an endpoint uses a multicast address.
- [Java] Remove MDS channel transports when endpoint is closing.
- [Java] Make default secure random algorithm dependent on OS.
- [Java] Make the algorithm used for SecureRandom configurable and specify NativePRNGNonBlocking as the default.
- [C] Use the correct function point for on_request_setup when in non-shared modes.
- [C] add a README for the C Archive client
- [C] Use separate variable for disabling status messages.
- Remove function macro UINT8_C() from AERON_DATA_HEADER_UNFRAGMENTED definition. (#1685)
- [Java] Update RecoverPlan after standby snapshot replication completes with the replicated snapshot entries.
- [Java] Use subtract and compare to zero for all deadline calculations for consistency.
- [Java] Ensure that deadline checking is wrapping safe and use separate variable for disabling status messages.
- [C] Use next_sm_deadline_ns instead of last_sm_time_ns as the code is simpler and safer.
- [Java] Use nextSmDeadlineNs instead of lastSmTimeNs as the code is simpler and safer.
- [C] Fix image matching check on
stream_id
plus refactor. - [Java] Do not log warning if the PublicationImage is active.
- [Java] Fix SM timeout check.
- [C] Stop sending status messages on draining/lingering images when a new publication image with the same channel/stream/session is created.
- [Java] Use more accurate check for sm timeout.
- [Java] Stop sending status messages on draining/lingering images when a new publication image with the same channel/stream/session is created.
- [C] Align send channel validation with Java.
- [C] Extract subscription matching logic and reuse across different methods.
- Undef if defined major() and minor() defined in sys/sysmacros.h (#1681)
- [Java] Send
appVersion
in theNewLeadershipTerm
. - [C] Align network subscription validation with the Java side + fix clashing subscription validation.
- [C] Extend subscription clash validation with a
isResponse
check, i.e. do not allowcontrol-mode=response
match a non-response subscription. - [Java] Extend subscription clash validation with a
isResponse
check, i.e. do not allowcontrol-mode=response
match a non-response subscription. - [C] Ensure that we use the incoming channel uri's control_mode to verify if it as response channel request when checking for clashing subscriptions.
- [Java] Enable response channels test.
- [C] Take into account
is_response
field when linking network Subscriptions. - [C] Create wildcard session interest when
session-id
not specified. - [C] Free allocated memory in
aeron_data_packet_dispatcher_add_subscription
in case of initialization error. - [Java] Remove session id from response channel when replaying via response channels.
- [Java] Fix a race whereby subscription might try to add an image that was already removed when Aeron client gets shutdown.
- [Java] Modify receive channel matching by delegating to
isWildcardOrSessionIdMatch
, i.e. take into accountisResponse
flag. - [Java] Capture complete AeronStat by delaying close after a test was executed.
- [Java] Check for errors before test-level cleanup callbacks are executed, i.e. ensure that we get access to the complete set of counters before things are being closed.
- [Java] Add unique
session-id
to both request and response channels to ensure that different Archive clients are not blocking each other if not being polled. This only applies if response channel does not specifycontrol-mode=response
. - enterElection reason message tidy up. (#1677)
- [C] Prevent a segfault caused by a pathologically slow consumer, i.e. when a call to
Controlled/FragmentHandler#onFragment
blocks until an Image becomes unavailable and the corresponding log buffers are being freed by the client conductor. - [Java] Increase resting timeout to trigger the segfault condition.
- [Java] Add PathologicallySlowConsumerTest.
- [Java] Prevent a process segfault caused by a pathologically slow consumer, i.e. when a call to
Controlled/FragmentHandler#onFragment
blocks until an Image becomes unavailable and the corresponding log buffers are being freed by the client conductor. - [Java] Deprecate
aeron.cluster.members.ignore.snapshot
property. - [Java] Add commit position counter id to ConsensusModuleControl interface.
- [Java] Generate unique control response stream id for AeronArchive in the Cluster. (#1670)
- Use provided credentials on standby snapshot (#1674)
- [Java] Support time travel Cluster tests.
- [Java] Extend test to assert actual position recovery.
- Tethering join position (#1672)
- cppbuild exitcode=1 when unknown arg. (#1673)
- [Java] Send recording descriptors the same way other responses are sent, i.e. retry on failure and preserve relative order of the responses.
- [Build] Replace use of 7-zip in cppbuild.ps1 with the PowerShell built-in Expand-Archive.
- [Java] Use
term-length=64k
for the local control request/response channels. - [Java] Add additional information that is useful for debugging, include the name of the aeron directory on the MediaDriver toString and the thread name for the background executor thread.
- [Java] Use
sendErrorResponse
when sending errors back from the Archive. - [Java] Remove ControlResponseProxy parameter.
- [Java] Schedule RecordingSignal sending the same way the responses are scheduled.
- [Java] Track total snapshot duration on follower nodes.
- [C] Do not fail fast on I/O exceptions in the send/receive path, i.e. process remaining transports/publications/destinations in the same duty cycle.
- [Java] Do not fail fast on IOExceptions in the send/receive path, i.e. process remaining transports/publications/destinations in the same duty cycle.
- [Java] Add end-to-end test for session interest fix.
- [Java] Count bytes received when the number of transports exceeds
TransportPoller#ITERATION_THRESHOLD
. - [Java] Add testing callback that will put the test method name onto the thread name.
- [Java] Do not refine response channel for a ClusterSession during a replay or loading of a snapshot.
- [C] Check subscribed sessions before removing a stream interest before removing by stream_id.
- [build] remove .CMakeLists.txt.swp
- Archive client [C and C++ wrapper] (#1636)
- [Java] Assert that synchronous ControlSession calls are only allowed from a conductor thread.
- [Java] Fix race with new Image becoming available while reject errors are asserted.
- [Java] Add debug wrapper for
messageTimeout
. - [Java] Remove expensive file existence checks when scheduling segment file deletion.
- [Java] Delete segment files without renaming so that conductor thread will not be blocked for a long period of time.
- [C] Use plain read and write of the "begin" field in Dekker's algorithm as it is properly fenced.
- [Java] Use plain read and write of the "begin" field in Dekker's algorithm as it is properly fenced.
- [Java] Fix endianness when accessing string fields.
- [C] Rename
AERON_GET_VOLATILE -> AERON_GET_ACQUIRE
andAERON_PUT_ORDERED -> AERON_SET_RELEASE
to reflect the actual semantics of those operations and to match the naming of the Java APIs. - [C] Remove
AERON_PUT_VOLATILE
. - [C] Optimize SM and loss handling by using acquire/release operations with fences.
- [Java] Optimize SM and loss handling by using acquire/release operations with fences.
- [Java] Keep publishing position updates from canvass and into nominate state so other cluster node members can take action in extended election timeout durations.
- [Java] Make it clear that appointed leader config is a testing feature only.
- [Java] Don't return error when stopReplay is called on a replay that does not exist (likely already stopped).
- [Gradle] Disable auto detection of JVMs to force a specific JVM for test execution in CI.
- [Java] Update Cluster documentation around operations that require looping until success.
- [Java] Tidy up incrementing of ClusterBackup snapshot counter.
- [Java] Expose a counter on the Cluster Backup to track the number of snapshots downloaded.
- [Java] Have ClusterTool.describeLatestConsensusModuleSnapshot return an indication of failure or success.
- [Java] Migrate atomic field updaters to var handles.
- [Java] Use storeStoreFence in HeaderWriter now VarHandle is available.
- [Java] Move CapturingPrintStream to test-support.
- [Java] Await client error before asserting static counter state.
- [Java] Suppress Checkstyle LineLength for package-info.java files.
- [Java] Suppress Checkstyle LineLength for package-info.java files.
- [Java] Update schema reference (#1664)
- [Java] Cluster tool refactoring to allow extensions (#1665)
- [Java] Introduce VarHandles to begin replacing atomic field updaters.
- [Archive]Enhance the ArchiveTool.delete-orphaned-segments method (#1661)
- [Gradle] Simplify version management.
- [Java] More robust checking of symbolic links.
- [Java] Ensure that DataCollector does not traverse symbolic links when finding files.
- [Java] Use typesafe version catalogs.
- [Git] Exclude
buildSrc
. - [Build] Correctly disable deprecation message in the Windows build.
- [Build] Add deprecation message to C++ API.
- [Java] Include an explicit name to the NameResolver interface to aid logging/debugging.
- [Java] Use unique
session-id
when creating live log recording and replaying data from the Cluster to ensure that the old replay data is not being picked up upon restart/reset. - [Java] Update ClusterTest.shouldCatchupFollowerWithSlowService to account for fragment limits and being unresponsive due to some OSs sleeping for 16ms rather than the requested 1ms.
- [Java] Poll for archive response when loading snapshot any time a break occurs.
- [Java] Support consensus module snapshot extension.
- Error Frames and User Image Invalidation (#1604)
- [C] Add REMOVE_BY_DESTINATION_ID to debug logging.
- [Java] Add REMOVE_BY_DESTINATION_ID to debug logging.
- [Java] Upgrade to Agrona 2.0.1.
- [Java] Upgrade to SBE 1.34.1.
- [Java] Upgrade to ByteBuddy 1.15.11.
- [Java] Upgrade to JUnit 5.11.4.
- [Java] Upgrade to Mockito 5.15.2.
- [Java] Upgrade to Checkstyle 10.21.1.
- [Java] Upgrade to Gradle 8.11.1.
- [Java] Upgrade to Shadow 8.3.5.