What's Changed
🪄 Enhancements
- feat(artifacts): keep uncommitted uploads in separate staging area by @moredatarequired in #4505
- perf(sdk): improve file descriptor management by @dmitryduev in #4617
- feat(launch): default to using model-registry project for agent and launch_add by @KyleGoyette in #4613
- feat(sdk): add
exist_ok=False
tofile.download()
by @janosh in #4564 - feat(launch): auto create job artifacts from runs with required ingredients by @KyleGoyette in #4660
- feat(sdk): add generalized response injection pattern for tests by @kptkin in #4729
- perf(sdk): replace multiprocessing.Queue's with queue.Queue's by @dmitryduev in #4672
- feat(sdk): use transaction log to cap memory usage by @raubitsj in #4724
- feat(integrations): support system metrics for AWS Trainium by @dmitryduev in #4671
🔨 Fixes
- fix(sdk): correct the type hint for wandb.run by @edwag in #4585
- fix(sdk): resume collecting system metrics on object restart by @dmitryduev in #4572
- fix(launch): fix env handling and node_selector handling by @KyleGoyette in #4555
- fix(public-api): fix Job.call() using the wrong keyword (queue vs queue_name) when calling launch_add. by @TimH98 in #4625
- fix(sweeps): sweeps schedulers handles multi word parameters by @gtarpenning in #4640
- fix(launch): allow spaces in requirements file, remove duplicate wandb bootstrap file by @TimH98 in #4647
- fix(artifacts): correctly handle url-encoded local file references. by @moredatarequired in #4665
- fix(artifacts): get digest directly instead of from the manifests' manifest by @moredatarequired in #4681
- fix(artifacts): artifact.version should be the version index from the associated collection by @vwrj in #4486
- fix(sdk): remove duplicate generate_id functions, replace shortuuid with secrets by @moredatarequired in #4676
- fix(integrations): fix type check for jax.Array introduced in jax==0.4.1 by @dmitryduev in #4718
- fix(sdk): fix hang after failed wandb.init (add cancel) by @raubitsj in #4405
- fix(sdk): allow users to provide path to custom executables by @kptkin in #4604
- fix(sdk): fix TypeError when trying to slice a Paginator object by @janosh in #4575
- fix(integrations): add
AttributeError
to the list of handled exceptions when saving a keras model by @froody in #4732 - fix(launch): remove args from jobs by @KyleGoyette in #4750
📚 Docs
- docs(sweeps): fix typo in docs by @gtarpenning in #4627
- docs(sdk): fix typo in docstring for data_types.Objects3D by @ngrayluna in #4543
- docs(sdk): remove less than, greater than characters from dosctrings… by @ngrayluna in #4687
- docs(sdk): update SECURITY.md by @dmitryduev in #4616
- docs(sdk): Update README.md by @ngrayluna in #4468
⚙️ Dev
- test(sdk): update t2_fix_error_cond_feature_importances to install scikit-learn by @dmitryduev in #4573
- chore(sdk): update base Docker images for nightly testing by @dmitryduev in #4566
- chore(sdk): change sklearn to scikit-learn in functional sacred test by @dmitryduev in #4577
- chore(launch): add error check for
--build
when resource=local-process by @gtarpenning in #4513 - chore(sweeps): update scheduler and agent resource handling to allow DRC override by @gtarpenning in #4480
- chore(sdk): require sdk-team review for adding or removing high-level… by @dmitryduev in #4594
- chore(launch): remove requirement to make target project match queue by @KyleGoyette in #4612
- chore(sdk): enhance nightly cloud testing process by @dmitryduev in #4602
- chore(sdk): update pull request template by @raubitsj in #4633
- chore(launch): return updated runSpec after pushToRunQueue query by @gtarpenning in #4516
- chore(launch): fix for run spec handling in sdk by @gtarpenning in #4636
- chore(sdk): remove test dependency on old fastparquet package by @raubitsj in #4656
- test(artifacts): fix dtype np.float (does not exist), set to python float by @moredatarequired in #4661
- chore(sdk): correct 'exclude' to 'ignore-paths' in .pylintrc by @moredatarequired in #4659
- chore(sdk): use pytest tmp_path so we can inspect failures by @raubitsj in #4664
- chore(launch): reset build command after building by @gtarpenning in #4626
- ci(sdk): rerun flaking tests in CI with pytest-rerunfailures by @dmitryduev in #4430
- chore(sdk): remove dead code from filesync logic by @speezepearson in #4638
- chore(sdk): remove unused fields from a filesync message by @speezepearson in #4662
- chore(sdk): refactor retry logic to use globals instead of dependency-injecting them by @speezepearson in #4588
- test(sdk): add unit tests for filesync.StepUpload by @speezepearson in #4652
- test(sdk): add tests for Api.upload_file_retry by @speezepearson in #4639
- chore(launch): remove fallback resource when not specified for a queue by @gtarpenning in #4637
- test(artifacts): improve storage handler test coverage by @moredatarequired in #4674
- test(integrations): fix import tests by @dmitryduev in #4690
- chore(sdk): make MetricsMonitor less verbose on errors by @dmitryduev in #4618
- test(sdk): address fixture server move from port 9003 to 9010 in local-testcontainer by @dmitryduev in #4716
- chore(sdk): vendor promise==2.3.0 to unequivocally rm six dependency by @dmitryduev in #4622
- chore(artifacts): allow setting artifact cache dir in wandb.init(...) by @dmitryduev in #3644
- test(sdk): temporary lower network buffer for testing by @raubitsj in #4737
- chore(sdk): add telemetry if the user running in pex environment by @kptkin in #4747
- chore(sdk): add more flow control telemetry by @raubitsj in #4739
- chore(sdk): add settings and debug for service startup issues (wait_for_ports) by @raubitsj in #4749
- test(sdk): fix AWS Trainium test by @dmitryduev in #4753
- chore(sdk): fix status checker thread issue when user process exits without finish() by @raubitsj in #4761
- chore(sdk): add telemetry for service disabled usage by @kptkin in #4762
💅 Cleanup
- style(sdk): use the same syntax whenever raising exceptions by @moredatarequired in #4559
- refactor(sdk): combine _safe_mkdirs with mkdir_exist_ok by @moredatarequired in #4650
- refactor(artifacts): use a pytest fixture for the artifact cache by @moredatarequired in #4648
- refactor(artifacts): use ArtifactEntry directly instead of subclassing by @moredatarequired in #4649
- refactor(artifacts): consolidate hash utilities into lib.hashutil by @moredatarequired in #4525
- style(public-api): format public file with proper formating by @kptkin in #4697
- chore(sdk): install tox into proper env in dev env setup tool by @dmitryduev in #4318
- refactor(sdk): clean up the init and run logic by @kptkin in #4730
New Contributors
- @edwag made their first contribution in #4585
- @TimH98 made their first contribution in #4625
- @froody made their first contribution in #4732
Full Changelog: v0.13.7...v0.13.8