github NVIDIA/Megatron-LM v0.11.0
NVIDIA Megatron Core 0.11.0

latest releases: core_v0.15.0rc2, core_v0.1.0rc0, core_v0.15.0rc0...
5 months ago
  • Add multi datacenter training support though N/S connection
  • MoE
    • Features
      • Support DeepSeek-V3 fine-tuning
        • Aux-loss-free load balancing strategy
        • Node-limited routing and Device-limited routing support.
        • Tensor Parallelism support for MLA and Sequence Auxiliary Loss
        • MTP (with TP and PP support) is coming soon.
      • Permutation / Unpermutation fusion kernel from TransformerEngine.
      • Uneven virtual pipeline parallel split support in first and last PP stage.
    • Bug fixes:
      • Fix the grad scale when TP != expert-TP and average_in_collective is enabled in DDP.
      • Fix TEGroupedMLP distckpt compatibility issue with FP8 padding/unpadding.
    • Known Issues:
      • When training the Dense+MoE hybrid model, the process will hang if any PP rank does not have expert params.

Don't miss a new Megatron-LM release

NewReleases is sending notifications on new releases.