deepspeed 0.7.4 on Python PyPI

What's Changed

MOE residual matmult unit test by @samadejacobs in #2323
MOE matmult with memaccess by @samadejacobs in #2336
Refactor residual add kernels by @arashb in #2333
mem access for quantize kernel by @GuanhuaWang in #2331
increase min pre-commit versions by @jeffra in #2346
Extend scratch buffer for long prompts by @cmikeh2 in #2212
[docs] fix zero docs by @jeffra in #2350
Staging profile inference v1 (#2348) by @awan-10 in #2349
Kernel Data Conversion Utility by @cmikeh2 in #2327
Add Onebit Optimizers in init by @l4d2boomer in #2340
docs(mixture-of-experts-inference): fix typo in tuto by @jqueguiner in #2345
Use blob storage for datasets in unit tests by @mrwyattii in #2342
Refactor gptj_residual_add kernels for better readability by @arashb in #2358
Updated issue templates by @jeffra in #2363
fix cuda invalid config error in dequant kernel by @GuanhuaWang in #2362
Add missing pytest fixture scope by @arashb in #2353
Extend residual_add kernel tests to cover pre_attn_norm by @arashb in #2354
Refactor fused_bias_residual kernels for better readability by @arashb in #2356
Capture error message during sweep tests by @molly-smith in #2351
Fix an exception when auto-casting dicts to fp16 by @mjksmith in #2370
Refactor remaining distributed tests by @mrwyattii in #2216
Fix the MLP output tensor's shape by @arashb in #2380
add 11.8 to cuda_minor_mismatch_ok to allow building with current CUDA by @Thomas-MMJ in #2390
Pin Transformers test version by @mrwyattii in #2402
Change type to tuple in replace_wo_policy isinstance check by @lekurile in #2387
Checkpoint backwards-compatbility workaround by @tjruwase in #2384
Add Predicated Global Load to Memory Access Utils by @cmikeh2 in #2373
MII blog post by @jeffra in #2418
Fix figure reference by @awan-10 in #2419
Add SLURM Multinode Runner by @dashstander in #2404
Fix issue with corrupted output on long generation for GPT by @andrewchernyh in #2359
Fix GPT Neo-X multi-gpu inference by @andrewchernyh in #2401
CI fixes related to triton by @jeffra in #2422
[docs] update mii blog title by @jeffra in #2423
add SD injection policy by @jeffra in #2381
Fix checkpoint loading when it is a dictionary by @RezaYazdaniAminabadi in #2425
Make error regex more generic in collect_results.py by @molly-smith in #2415
fixes #2389 by @clumsy in #2411
Fix for inference gpt-j test by @mrwyattii in #2430
Fixing bug 2361 by @jomayeri in #2410
Universal checkpoint for zero stage 1 by @tjruwase in #2284
only add deps if extra is explicitly called by @jeffra in #2432
Add TestInjectionPolicy inference unittest class for testing custom injection policies by @lekurile in #2426
[memory estimators] new config args sync by @stas00 in #2431
parallelize writing of layer checkpoint files across data parallel instances by @adammoody in #1419
Fix broken link to DeepSpeed Megatron fork by @lekurile in #2440

New Contributors

@l4d2boomer made their first contribution in #2340
@jqueguiner made their first contribution in #2345
@mjksmith made their first contribution in #2370
@Thomas-MMJ made their first contribution in #2390
@lekurile made their first contribution in #2387
@dashstander made their first contribution in #2404
@andrewchernyh made their first contribution in #2359
@clumsy made their first contribution in #2411
@jomayeri made their first contribution in #2410

Full Changelog: v0.7.3...v0.7.4

deepspeed 0.7.4 v0.7.4: Patch release on Python PyPI

What's Changed

New Contributors

deepspeed 0.7.4
v0.7.4: Patch release

on Python PyPI