Metaflow 2.0.4 Release Notes
- Improvements
- Expose
retry_countinCurrent - Mute superfluous
ThrottleExceptionsin AWS Batch job logs
- Expose
- Bug Fixes
- Set proper thresholds for retrying
DescribeJobsAPI for AWS Batch - Explicitly override
PYTHONNOUSERSITEforcondaenvironments - Preempt AWS Batch job log collection when the job fails to get into a
RUNNINGstate
- Set proper thresholds for retrying
The Metaflow 2.0.4 release is a minor patch release.
Improvements
Expose retry_count in Current
You can now use the current singleton to access the retry_count of your task. The first attempt of the task will have retry_count as 0 and subsequent retries will increment the retry_count. As an example:
@retry
@step
def my_step(self):
from metaflow import current
print("retry_count: %s" % current.retry_count)
self.next(self.a)Mute superfluous ThrottleExceptions in AWS Batch job logs
The AWS Logs API for get_log_events has a global hard limit on 10 requests per sec. While we have retry logic in place to respect this limit, some of the ThrottleExceptions usually end up in the job logs causing confusion to the end-user. This release addresses this issue (also documented in #184).
Bug Fixes
Set proper thresholds for retrying DescribeJobs API for AWS Batch
The AWS Batch API for describe_jobs throws ThrottleExceptions when managing a flow with a very wide for-each step. This release adds retry behavior with backoffs to add proper resiliency (addresses #138).
Explicitly override PYTHONNOUSERSITE for conda environments
In certain user environments, to properly isolate conda environments, we have to explicitly override PYTHONNOUSERSITE rather than simply relying on python -s (addresses #178).
Preempt AWS Batch job log collection when the job fails to get into a RUNNING state
Fixes a bug where if the AWS Batch job crashes before entering the RUNNING state (often due to incorrect IAM perms), the previous log collection behavior would fail to print the correct error message making it harder to debug the issue (addresses #185).