What's Changed
- Update version.txt after 0.16.4 release by @loadams in #7063
- fix an outdated doc wrt CUDA_VISIBLE_DEVICES by @stas00 in #7058
- Tecorigin sdaa accelerator by @siqi654321 in #6903
- Handle special case of libuv for Windows by @loadams in #7064
- Bug Fix for offload_states API by @U-rara in #7050
- Update README with info on newest accelerator by @loadams in #7065
- Fix TOCTOU issues, switch to fstat by @loadams in #7067
- config torch to avoid graph breaks caused by logger by @ShellyNR in #6999
- Fix meta load tensor imcompatible issue by @Yejing-Lai in #7073
- Replace calls to
python setup.py sdist
withpython -m build --sdist
by @loadams in #7069 - Revert "Handle special case of libuv for Windows (#7064)" by @loadams in #7076
- Add DeepseekV3 AutoTP. by @Yejing-Lai in #7045
- Improve inference tutorial docs by @loadams in #7083
- Pin transformers version on tests that use latest. by @loadams in #7085
- Update README.md with ICS '23 MoE paper link by @siddharth9820 in #7087
- Update parallelism for nv-torch-latest/nightly tests due to more GPUs/runner by @loadams in #7086
- Remove workflows for very old torch versions by @loadams in #7090
- Use new dlpack api; Formatting fixes by @tjruwase in #7101
- Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx by @deepcharm in #7081
- Avoid graph breaks in torch.compile caused by inner classes in the backward hooks by @deepcharm in #7062
- Only run pre-commit on the changes by @hwchen2017 in #7106
- Avoid graph break due to unsupported frozenset by @deepcharm in #7105
- Fix fused_qkv print model ValueError by @Yejing-Lai in #7109
- Update references to new X/Twitter handle by @loadams in #7110
- Update gaudi2 nightly,ci to latest 1.20.0 build by @raza-sikander in #7093
- fix keep_module_on_host by @inkcherry in #7112
- Add sequential pytest mark to TestNVMeCheckpointing to resolve pytest forked hangs by @loadams in #7131
- Training multiple models by @tjruwase in #7018
- Update CONTRIBUTING.md to reflect changes from CLA to DCO by @loadams in #7135
- Avoid missing attr error by @tjruwase in #7133
- Add conditional expression by @A-transformer in #7119
- Unpin transformers version for most workflows by @loadams in #7139
- Conditionally quote env vars by @saurabhkoshatwar in #7071
- Correct the BACKWARD_PREFETCH_SUBMIT mismatch by @A-transformer in #7120
- Enhance Gaudi2 CI/Nightly Coverage with Model Parallelism and Linear Tests by @raza-sikander in #7146
- Update container version that runs on A6000 tests. by @loadams in #7153
- hf tp+zero training doc. by @inkcherry in #7151
- Avoid graph break by removing redundant requires_grad attr change by @deepcharm in #7158
- Add destroy to tests to free memory by @tohtana in #7160
- [NFC] Typo fix in SP layer. by @c8ef in #7152
- Link AutoTP blog in the front page by @hwchen2017 in #7167
- fix
seq_parallel_communication_data_type
constant. by @stas00 in #7175 - Fix typos in GDS blog by @loadams in #7177
- Variable batch size and LR scheduler by @bm-synth in #7104
New Contributors
- @siqi654321 made their first contribution in #6903
- @A-transformer made their first contribution in #7119
- @saurabhkoshatwar made their first contribution in #7071
- @c8ef made their first contribution in #7152
Full Changelog: v0.16.4...v0.16.5