What's Changed
- Add blob storage to CI runners by @mrwyattii in #2260
- Update replace_module.py, test-gptj.py related fix by @molly-smith in #2269
- Fix OrderedDict import for python3.6 by @Dipet in #2267
- Ds inference/fix mp2 by @RezaYazdaniAminabadi in #2270
- Trajepl: nebula load fix by @trajepl in #2182
- Prevent torch ext folder mkdir at tmp by @jeffra in #2274
- Ds-inference Int8 support through ZeroQuant technology by @RezaYazdaniAminabadi in #2217
- add a new unit test for cuda ops by @awan-10 in #2278
- Addition to code owners file by @cmikeh2 in #2279
- Memory Access Utility by @cmikeh2 in #2276
- Fp32 accuracy bug fix by @RezaYazdaniAminabadi in #2285
- Refactor universal checkpointing and tensor fragments by @tjruwase in #2253
- [ds-inference] fix progress bar by @stas00 in #2286
- Offload all gradients to nvme by @tjruwase in #2282
- fused bias relu unittest by @molly-smith in #2297
- Fix for pytest picking up wrong deepspeed by @mrwyattii in #2299
- Fix for Zero3 when MP>1 by @Quentin-Anthony in #2289
- Unit test for bias add kernel by @mrwyattii in #2298
- Update relu.cu with mem_access_utils by @molly-smith in #2306
- Add tensor parallel inference unit tests by @mrwyattii in #2232
- Fix the residual add mp scaling for GPTNeoX by @arashb in #2310
- Add unit tests for residual_add kernel by @arashb in #2307
- add inference eval scripts by @jeffra in #2303
- Upgrade P40 tests to torch 1.8 by @mrwyattii in #2316
- ZeRO-Inference blog by @tjruwase in #2271
- ZeRO-Inference blog - wrap up by @tjruwase in #2321
- ZeRO-Inference blog - Update README by @tjruwase in #2322
- Refactor relu bias add with mem_access utils by @mrwyattii in #2317
- add quant unit test by @GuanhuaWang in #2315
- only override forward if using cuda-graph by @jeffra in #2291
- Add more options to inference benchmark by @mrwyattii in #2325
New Contributors
- @molly-smith made their first contribution in #2269
Full Changelog: v0.7.2...v0.7.3