What's Changed
- CUDA optional deepspeed ops by @tjruwase in #2507
- Remove CI trigger for push to master by @mrwyattii in #2712
- [install] only add deepspeed pkg at install by @jeffra in #2714
- Fix nightly tests for new lm-eval release by @mrwyattii in #2713
- BF16 optimizer for BF16+ZeRO Stage 1 by @jomayeri in #2706
- Fix typo in diffusers transformer block by @mrwyattii in #2718
- Inference Refactor (replace_with_policy, model_implementations) by @awan-10 in #2554
- Change zero_grad() argument to match pytorch by @loadams in #2741
- Automatic tensor parallelism v2 by @molly-smith in #2670
- Fixing Optimizer Sanity Check by @jomayeri in #2742
- [GatheredParameters] fix memory leak by @stas00 in #2665
- Abstract accelerator (step 3) by @delock in #2677
- Fix autotuning so that it records Floating Point Operations per second, not microsecond by @dashstander in #2711
- fix a misspelled attribute by @stas00 in #2750
- [zero] remove misleading dtype log by @jeffra in #2732
- Fix softmax backward by @RezaYazdaniAminabadi in #2709
- Skip test_bias_gelu unit test if torch < 1.12 by @lekurile in #2754
- Conditionally Make Op Building More Verbose by @cmikeh2 in #2759
- Bing/formatting correction by @xiexbing in #2764
- Add links to new azureML examples by @cassieesvelt in #2756
- Fix hardcoded instances to fp16 in optimizer creation log messages to the correct dtype. by @loadams in #2743
- Refactor/Pydantify monitoring config by @mrwyattii in #2640
- Pin minimum
packaging
requirement by @carmocca in #2771 - Fix for diffusers v0.12.0 by @mrwyattii in #2753
- some fix in flops_profiler by @lucasleesw in #2068
- fix upsample flops compute by skipping unused kargs by @cli99 in #2773
- Fix broken kernel inject bug by @molly-smith in #2776
- Fix Checkpoint-loading with Meta-tensor by @RezaYazdaniAminabadi in #2781
- Add hjson support for user configs by @mrwyattii in #2783
- Reset KV-cache at the beginning of text-generation by @RezaYazdaniAminabadi in #2669
- Container param cleanup + remove qkv_merging by @lekurile in #2780
- Common location to install libaio-dev by @tjruwase in #2779
- Fixing broken link to azureml-examples recipes by @rtanase in #2795
- remove outdated comment by @stas00 in #2786
- Enable page-locked tensors without CUDA by @tjruwase in #2775
- Add container load checkpoint error reporting + refactor by @lekurile in #2792
- Add user defined launcher args for PDSH launcher by @loadams in #2804
- Fix Slurm launcher user args by @loadams in #2806
- Handle hanged tests in CI by @mrwyattii in #2808
- Fix inference CI device error by @mrwyattii in #2824
- Fix permissions issue with pip upgrade by @mrwyattii in #2823
- Fix cpu-only CI hangs by @mrwyattii in #2825
- Fix Pipeline Parallel resize unit test by @mrwyattii in #2833
- Fix auto TP for duplicate modules with different gems by @molly-smith in #2784
- Refactor DS inference API. No longer need replace_method. by @awan-10 in #2831
- Port Reza's INT8-quantization fix to container architecture by @lekurile in #2725
- Fix gpt-Neox rotary embedding implementation by @RezaYazdaniAminabadi in #2782
- Fix for CI failure on system upgrade by @mrwyattii in #2849
New Contributors
- @loadams made their first contribution in #2741
- @xiexbing made their first contribution in #2764
- @carmocca made their first contribution in #2771
- @lucasleesw made their first contribution in #2068
- @rtanase made their first contribution in #2795
Full Changelog: v0.8.0...v0.8.1