github unslothai/unsloth August-2025-v2
Unsloth Flex Attention + Long context gpt-oss Training

4 days ago

We’re excited to introduce Unsloth Flex Attention support for OpenAI gpt-oss training that enables >8× longer context lengths, >50% less VRAM usage and >1.5× faster training compared to all implementations including those using Flash Attention 3 (FA3). Unsloth Flex Attention makes it possible to train with a 60K context length on just 80GB of VRAM for BF16 LoRA. Also:

  • You can now export/save your QLoRA fine-tuned gpt-oss model to llama.cpp, vLLM, or HF.
  • We fixed gpt-oss training losses going to infinity on float16 GPUs (like T4 Colab)
  • We fixed gpt-oss implementation issues, most notably ensuring that swiglu_limit = 7.0 is properly applied during MXFP4 inference in transformers
  • Unsloth Flex Attention scales with context, longer sequences yield bigger savings in both VRAM and training time

Full details in our blogpost: https://docs.unsloth.ai/basics/long-context-gpt-oss-training

What's Changed

New Contributors

Full Changelog: August-2025...August-2025-v2

Don't miss a new unsloth release

NewReleases is sending notifications on new releases.