What's Changed
- MOE residual matmult unit test by @samadejacobs in #2323
- MOE matmult with memaccess by @samadejacobs in #2336
- Refactor residual add kernels by @arashb in #2333
- mem access for quantize kernel by @GuanhuaWang in #2331
- increase min pre-commit versions by @jeffra in #2346
- Extend scratch buffer for long prompts by @cmikeh2 in #2212
- [docs] fix zero docs by @jeffra in #2350
- Staging profile inference v1 (#2348) by @awan-10 in #2349
- Kernel Data Conversion Utility by @cmikeh2 in #2327
- Add Onebit Optimizers in init by @l4d2boomer in #2340
- docs(mixture-of-experts-inference): fix typo in tuto by @jqueguiner in #2345
- Use blob storage for datasets in unit tests by @mrwyattii in #2342
- Refactor
gptj_residual_add
kernels for better readability by @arashb in #2358 - Updated issue templates by @jeffra in #2363
- fix cuda invalid config error in dequant kernel by @GuanhuaWang in #2362
- Add missing pytest fixture scope by @arashb in #2353
- Extend residual_add kernel tests to cover pre_attn_norm by @arashb in #2354
- Refactor
fused_bias_residual
kernels for better readability by @arashb in #2356 - Capture error message during sweep tests by @molly-smith in #2351
- Fix an exception when auto-casting dicts to fp16 by @mjksmith in #2370
- Refactor remaining distributed tests by @mrwyattii in #2216
- Fix the MLP output tensor's shape by @arashb in #2380
- add 11.8 to cuda_minor_mismatch_ok to allow building with current CUDA by @Thomas-MMJ in #2390
- Pin Transformers test version by @mrwyattii in #2402
- Change type to tuple in replace_wo_policy isinstance check by @lekurile in #2387
- Checkpoint backwards-compatbility workaround by @tjruwase in #2384
- Add Predicated Global Load to Memory Access Utils by @cmikeh2 in #2373
- MII blog post by @jeffra in #2418
- Fix figure reference by @awan-10 in #2419
- Add SLURM Multinode Runner by @dashstander in #2404
- Fix issue with corrupted output on long generation for GPT by @andrewchernyh in #2359
- Fix GPT Neo-X multi-gpu inference by @andrewchernyh in #2401
- CI fixes related to triton by @jeffra in #2422
- [docs] update mii blog title by @jeffra in #2423
- add SD injection policy by @jeffra in #2381
- Fix checkpoint loading when it is a dictionary by @RezaYazdaniAminabadi in #2425
- Make error regex more generic in collect_results.py by @molly-smith in #2415
- fixes #2389 by @clumsy in #2411
- Fix for inference gpt-j test by @mrwyattii in #2430
- Fixing bug 2361 by @jomayeri in #2410
- Universal checkpoint for zero stage 1 by @tjruwase in #2284
- only add deps if extra is explicitly called by @jeffra in #2432
- Add TestInjectionPolicy inference unittest class for testing custom injection policies by @lekurile in #2426
- [memory estimators] new config args sync by @stas00 in #2431
- parallelize writing of layer checkpoint files across data parallel instances by @adammoody in #1419
- Fix broken link to DeepSpeed Megatron fork by @lekurile in #2440
New Contributors
- @l4d2boomer made their first contribution in #2340
- @jqueguiner made their first contribution in #2345
- @mjksmith made their first contribution in #2370
- @Thomas-MMJ made their first contribution in #2390
- @lekurile made their first contribution in #2387
- @dashstander made their first contribution in #2404
- @andrewchernyh made their first contribution in #2359
- @clumsy made their first contribution in #2411
- @jomayeri made their first contribution in #2410
Full Changelog: v0.7.3...v0.7.4