github NVIDIA/Megatron-LM core_v0.5.0
NVIDIA Megatron Core 0.5.0

latest releases: core_v0.14.0rc7, core_v0.15.0rc3, core_v0.15.0rc2...
17 months ago

Key Features and Enhancements

Megatron core documentation is now live!

Model Features

  • MoE (Mixture of Experts)
    • Support for Z-loss, Load balancing and Sinkhorn
    • Layer and communications refactor
    • Richer parallelism mappings and EP can be combined with other model parallel techniques for larger MoE variants, e.g. EP + TP + DP + SP + PP
    • Token dropless architecture with Top-K routing
    • Performance optimization with with GroupedGEMM when number of local experts is > 1
    • Distributed checkpointing
  • Interleaved rotary embedding

Datasets

  • Masked WordPiece datasets for BERT and T5
  • Raw and mock datasets

Parallelism

Performance

  • Activation offloading to CPU
  • Rope and Swiglu fusion
  • Sliding window attention (via Transformer Engine)

General Improvements

  • Timers

Don't miss a new Megatron-LM release

NewReleases is sending notifications on new releases.