Overview
This PR introduces crucial changes to enhance the reliability and robustness of the locking mechanism. Specifically, it adds the functionality to make lock acquisition atomic and to periodically refresh the lock's expiration time. These changes reduce the risk of race conditions in concurrent environments, leading to more stable certificate management.
Key Changes
1. Atomic Lock Acquisition Logic:
- In the previous implementation, checking for lock existence and creating the lock were separate steps, which could allow another process to acquire the lock in between.
- The new implementation uses DynamoDB's
PutItem
operation with aConditionExpression
to make lock existence check and creation atomic. - By combining the
attribute_not_exists
andExpiresAt
conditions, the lock is only acquired if it does not exist or if it exists but has expired. - Furthermore, each lock is assigned a unique UUID (
LockID
), and theUpdateItem
operation'sConditionExpression
checks theLockID
to ensure that only the lock created by itself can be updated.
2. Periodic Lock Expiration Refresh:
- The previous implementation had a fixed lock expiration time, which could lead to the lock expiring if the critical section took too long.
- The new implementation starts a goroutine within the
Lock
method to periodically refresh the lock's expiration time atLockRefreshInterval
(defaulting toLockTimeout
/ 3). - This ensures that the lock is held until the critical section's process is complete.
Problems Solved and Benefits of the Changes
- Race Condition Prevention: Atomic lock acquisition prevents race conditions where multiple processes attempt to acquire the lock simultaneously.
- Critical Section Protection: Periodic lock refreshes prevent other instances from entering the critical section at the same time due to the lock expiring during the execution of the critical section. This prevents situations like:
- An instance acquires the lock and enters the critical section.
- The critical section's processing takes a long time, and the lock expires.
- Another instance acquires the lock and enters the critical section simultaneously.
- As a result, unexpected behavior occurs (for example, the certificate issuance process in the instance that first acquired the lock may stall).
- Improved Reliability: A more robust locking mechanism improves the reliability of certificate acquisition and renewal processes.
- Improved Stability in Clustered Environments: More stable behavior is expected, especially in clustered environments where multiple application instances share DynamoDB.
What's Changed
New Contributors
Full Changelog: v3.0.11...v3.1.0-pre.0