NVIDIA/Megatron-LM v0.11.0
NVIDIA Megatron Core 0.11.0

on GitHub

latest releases: core_v0.15.0rc2, core_v0.1.0rc0, core_v0.15.0rc0...

5 months ago

Add multi datacenter training support though N/S connection
MoE
- Features
  - Support DeepSeek-V3 fine-tuning
    - Aux-loss-free load balancing strategy
    - Node-limited routing and Device-limited routing support.
    - Tensor Parallelism support for MLA and Sequence Auxiliary Loss
    - MTP (with TP and PP support) is coming soon.
  - Permutation / Unpermutation fusion kernel from TransformerEngine.
  - Uneven virtual pipeline parallel split support in first and last PP stage.
- Bug fixes:
  - Fix the grad scale when TP != expert-TP and average_in_collective is enabled in DDP.
  - Fix TEGroupedMLP distckpt compatibility issue with FP8 padding/unpadding.
- Known Issues:
  - When training the Dense+MoE hybrid model, the process will hang if any PP rank does not have expert params.

Check out latest releases or
releases around NVIDIA/Megatron-LM v0.11.0

Don't miss a new Megatron-LM release

NewReleases is sending notifications on new releases.

Get notifications