- Multimodal
- Added initial support for training vision language models using the LLaVA architecture
- Added initial support for inference with multimodal inputs
- End-to-end multimodal example from data collection to training to evaluation is provided in examples/multimodal
- MoE
- Context Parallel support.
- Distributed checkpoint support for grouped GEMM.
- Mamba
- Added initial support for training and inference of Mamba-2 models
- Support for hybrid models consisting of Mamba-2, attention, and MLP layers
- Examples provided in examples/mamba