github NVIDIA/Megatron-LM core_v0.13.0
NVIDIA Megatron Core 0.13.0

latest releases: core_v0.15.0rc2, core_v0.1.0rc0, core_v0.15.0rc0...
one month ago
  • Support bf16 dtype for optimizer states to use precision-aware optimizer in TransformerEngine
  • MoE
    • Features:
      • Flexible Asymmetric Virtual Pipeline Parallelism with Custom Pipeline Layout (--pipeline-model-parallel-layout)
      • Add support to pass custom parallelism groups to MoE modules.
      • Add Hybrid Shard Data-Parallel support for MoE models (--num-distributed-optimizer-instances)
      • Support EP + custom FSDP training for DeepSeek-V3
      • FP8 support for Multi-Token-Prediction
    • Memory Optimization
      • Fine-grained recomputation to reduce activation memory. (--recompute-modules with --recompute-granularity selective)
      • Memory efficient token permutation by moving the probs multiplication from unpermutation to activation function of GroupedMLP.
    • Performance Optimization
      • MLA RoPE fusion kernel and YARN embedding cache.
      • FP8 padding optimization of MoE models by padding the routing map.
    • Bug fixes:
      • Fix the aux loss calculation when expert_bias or group limited routing is used. This leads to load_balancing_loss values change compared to the previous version.
      • Fix packed sequence support for MLA
    • Known Issues:
      • MTP is not compatible with flexible pipeline layout, will be fixed at !3594.
      • MTP convergence issue with TP2, will be fixed at !3594.

Don't miss a new Megatron-LM release

NewReleases is sending notifications on new releases.