github Netflix/metaflow 2.0.4
2.0.4 (Apr 28th, 2020)

latest releases: 2.19.14, 2.19.13, 2.19.12...
5 years ago

Metaflow 2.0.4 Release Notes

  • Improvements
    • Expose retry_count in Current
    • Mute superfluous ThrottleExceptions in AWS Batch job logs
  • Bug Fixes
    • Set proper thresholds for retrying DescribeJobs API for AWS Batch
    • Explicitly override PYTHONNOUSERSITE for conda environments
    • Preempt AWS Batch job log collection when the job fails to get into a RUNNING state

The Metaflow 2.0.4 release is a minor patch release.

Improvements

Expose retry_count in Current

You can now use the current singleton to access the retry_count of your task. The first attempt of the task will have retry_count as 0 and subsequent retries will increment the retry_count. As an example:

@retry
@step
def my_step(self):
    from metaflow import current
    print("retry_count: %s" % current.retry_count)
    self.next(self.a)

Mute superfluous ThrottleExceptions in AWS Batch job logs

The AWS Logs API for get_log_events has a global hard limit on 10 requests per sec. While we have retry logic in place to respect this limit, some of the ThrottleExceptions usually end up in the job logs causing confusion to the end-user. This release addresses this issue (also documented in #184).

Bug Fixes

Set proper thresholds for retrying DescribeJobs API for AWS Batch

The AWS Batch API for describe_jobs throws ThrottleExceptions when managing a flow with a very wide for-each step. This release adds retry behavior with backoffs to add proper resiliency (addresses #138).

Explicitly override PYTHONNOUSERSITE for conda environments

In certain user environments, to properly isolate conda environments, we have to explicitly override PYTHONNOUSERSITE rather than simply relying on python -s (addresses #178).

Preempt AWS Batch job log collection when the job fails to get into a RUNNING state

Fixes a bug where if the AWS Batch job crashes before entering the RUNNING state (often due to incorrect IAM perms), the previous log collection behavior would fail to print the correct error message making it harder to debug the issue (addresses #185).

Don't miss a new metaflow release

NewReleases is sending notifications on new releases.