What's Changed
- Fix Bug #2319 by @jomayeri in #2438
- update pytorch pool operator function signiture by @cli99 in #2443
- Fix build issues on Windows by @eltonzheng in #2428
- rollback ds config changes by @cli99 in #2395
- Use CUDA events for inference model profiling by @mrwyattii in #2371
- Fixing a config mismatch in unit test. by @jomayeri in #2447
- Reduction Kernel Utility by @cmikeh2 in #2436
- deepspeed/launcher/launch.py: add option enable_each_rank_log by @guoyejun in #2409
- Fixes for various CI problems by @mrwyattii in #2457
- Cache Allocation and Softmax Fixes by @cmikeh2 in #2433
- Fix checkpoint loading at inference-engine by @RezaYazdaniAminabadi in #2429
- Create a new folder structure to isolate model-specific code in DS by @awan-10 in #2464
- don't gather partitioned activations for mp size 1 by @guoyejun in #2454
- Updating autotune json default in docs. by @jomayeri in #2476
- Added MLFLOW environment variables for logging metrics within trainig… by @savitamittal1 in #2477
- fix accelerate link in README by @kyoto7250 in #2481
- Fix Stable-Diffusion: Add correct memory-allocation at DeepSpeed-Attention by @RezaYazdaniAminabadi in #2474
- Fix CI issues related to cupy install by @mrwyattii in #2483
- Add
scale_attn_by_inverse_layer_idx
feature by @hyunwoongko in #2486 - Stable Diffusion Enhancements by @cmikeh2 in #2491
- stage_1_and_2.py: no allreduce needed when mp size is 1 by @guoyejun in #2494
- Make bf16_optimizer work for non pipeline parallelism by @tjruwase in #2470
- Fix nightly CI tests by @mrwyattii in #2493
- Make data contiguous before the inplace reshape-copy_ function. by @lokoppakmsft in #2489
- Fix typos: deepseed -> deepspeed by @jinyouzhi in #2499
New Contributors
- @guoyejun made their first contribution in #2409
- @savitamittal1 made their first contribution in #2477
- @kyoto7250 made their first contribution in #2481
- @lokoppakmsft made their first contribution in #2489
- @jinyouzhi made their first contribution in #2499
Full Changelog: v0.7.4...v0.7.5