github NVIDIA/Megatron-LM core_r0.9.0
NVIDIA Megatron Core 0.9.0

latest releases: core_v0.15.0rc6, core_v0.15.0rc5, core_v0.15.0rc4...
11 months ago
  • Uneven pipeline parallelism
    • Enable pipeline parallelism where first and last ranks have fewer transformer layers than the intermediate ranks
  • Per layer CUDAGraph support for GPT training with Transformer Engine modules
  • Enable different TP sizes for the vision encoder
  • Enable pipeline parallelism for T5 & Llava models
  • Support multi-tile multi-image input in Llava models
  • MoE
    • FP8 support
    • Runtime upcycling support
    • Dispatcher implementation optimizations
    • Shared expert support with overlapping optimizations
      • Qwen Model support
  • Mamba Hybrid
    • Main branch is no longer compatible with released checkpoints (use ssm branch)
    • Add distributed checkpointing
    • Fix bugs related to inference
    • Add unit tests
  • Known Issues
    • When using sequence parallel, during the transformer block forward pass, dropout is not using the appropriate rng context.

Don't miss a new Megatron-LM release

NewReleases is sending notifications on new releases.