Performance and Stability Improvements
- Enhanced Network Performance: Implemented batching and rate limiting of executions per scheduling cycle, ensuring all executions are processed while maintaining system stability when orchestrating jobs across hundreds of nodes
- Improved Message Handling: Extended publish and subscribe timeout windows for more graceful network communication
- Critical Bug Fixes:
- Resolved boltdb transaction panic issues during context cancellation
- Fixed execution re-approval logic in the scheduler
- Enhanced transaction handling for better stability
Observability Enhancements
- Comprehensive Metrics Coverage: Introduced OpenTelemetry metrics across multiple core components:
- Job store metrics for storage monitoring
- Scheduler metrics for job distribution insights
- Planner metrics for execution planning visibility
- NCL (Network Communication Layer) metrics
- Message handler metrics for network communication monitoring
Operational Improvements
- Development Environment:
- Enhanced devstack logging format
- Added support for node joining existing networks
- Improved configuration passing mechanisms
Full Changelog: v1.6.1...v1.6.2