What's Changed
- Update version.txt after 0.16.2 release by @loadams in #6893
- Allow to compile collective for PT>2.3 by @NirSonnenschein in #6899
- Zero2: avoid graph breaks in torch.compile by using param_idx by @nelyahu in #6803
- hpu_accelerator: use torch.use_deterministic_algorithms by @nelyahu in #6897
- Fix error caused by all_reduce call in domino by @hwchen2017 in #6880
- Update Gaudi2 jobs to latest 1.19 build by @raza-sikander in #6905
- Change compile for pipeline module torch.compile by @NirSonnenschein in #6478
- Stage3: Use new torch grad accumulation hooks API by @deepcharm in #6773
- Cleanup ops/transformer/inference tests by @loadams in #6830
- Fix
checkpointable_layers
Logic by @Quentin-Anthony in #6881 - [BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm by @hj-wei in #6909
- Add fp8_gemm fallback for non-triton systems by @oelayan7 in #6916
- Reduce the device bubble introduced by heavy loop synchronization in coalesced fetch/release(z3_leaf_module) by @inkcherry in #6694
- Cleanup ops/transformer/inference tests by @loadams in #6925
- Check transformers version in BLOOM for inference v1 by @lekurile in #6766
- inference: remove unused _validate_args function by @nelyahu in #5505
- Use
torch.log1p
by @kit1980 in #6930 - Update python version classifiers by @loadams in #6933
- Fix building on Windows with presence of Triton by @woct0rdho in #6749
- Fix windows blog examples by @loadams in #6934
- Add deepseek autotp by @Yejing-Lai in #6937
- Add position_ids arg to OPTEmbedding forward function by @lekurile in #6939
- Add information on security expectations with this software by @loadams in #6941
- Support pure meta model lm_head tp by @Yejing-Lai in #6812
- Remove op compilation flags due to perf issue by @NirSonnenschein in #6944
- Pin nv-a6000 workflow by @loadams in #6938
- [inf] Add config var to enable keeping module on host by @oelayan7 in #6846
warn
towarning
by @qgallouedec in #6952- Add extra_repr to Linear classes for debugging purpose by @Xia-Weiwen in #6954
- Update import for torchvision.transformers by @loadams in #6958
- Remove Duplicate Declaration of pandas in
Dockerfile
by @Zerohertz in #6959 - Add the missing view operations from sequence parallel(async). by @inkcherry in #6750
- Update
torch.norm
totorch.linalg.norm
andtorch.linalg.vector_norm
by @loadams in #6931 - Using explicit GPU upcast for ZeRO-Offload by @xylian86 in #6962
New Contributors
- @hj-wei made their first contribution in #6909
- @kit1980 made their first contribution in #6930
- @woct0rdho made their first contribution in #6749
- @Xia-Weiwen made their first contribution in #6954
- @Zerohertz made their first contribution in #6959
Full Changelog: v0.16.2...v0.16.3