We’re excited to introduce Unsloth Flex Attention support for OpenAI gpt-oss training that enables >8× longer context lengths, >50% less VRAM usage and >1.5× faster training compared to all implementations including those using Flash Attention 3 (FA3). Unsloth Flex Attention makes it possible to train with a 60K context length on just 80GB of VRAM for BF16 LoRA. Also:
- You can now export/save your QLoRA fine-tuned gpt-oss model to llama.cpp, vLLM, or HF.
- We fixed gpt-oss training losses going to infinity on float16 GPUs (like T4 Colab)
- We fixed gpt-oss implementation issues, most notably ensuring that
swiglu_limit = 7.0
is properly applied during MXFP4 inference in transformers - Unsloth Flex Attention scales with context, longer sequences yield bigger savings in both VRAM and training time
Full details in our blogpost: https://docs.unsloth.ai/basics/long-context-gpt-oss-training
What's Changed
- Add Qwen3 Instruct / Thinking chat templates by @Etherll in #3110
- Add Qwen3 4B to mapper.py by @Etherll in #3120
- Nightly by @danielhanchen in #3148
- Fix GPT OSS by @danielhanchen in #3154
- Nightly by @danielhanchen in #3169
- Update Blackwell install instructions for latest vLLM release by @qingy1337 in #3175
- Fix potential generator exhaustion bug in model loading file detection by @rolandtannous in #3167
- Fix vision model GGUF quantization_method error type by @rolandtannous in #3173
- Replace back ticks with single quotes by @rnowling in #3157
- Fix original_push_to_hub fallback by @Thiraput01 in #3115
- Add support for QAT + LoRA by @andrewor14 in #2976
- Bug fixes by @danielhanchen in #3180
- Torch 2.8 by @danielhanchen in #3186
- Fix extras transformers typo in pyproject.toml by @parth2510 in #3187
- Bug fixes by @danielhanchen in #3195
- allow torch.float32 dtype in FastLanguageModel by @mmathew23 in #3204
- fix is casual for qwen3 by @leizhenyuan in #3213
- Support
model.save_pretrained_torchao
by @jerryzh168 in #3111 - Fix gemma-3n by @mmathew23 in #3219
- Handle transformers move to dtype from torch_dtype by @mmathew23 in #3225
- chore: Fix Typos by @DefiWimar7 in #3224
New Contributors
- @rnowling made their first contribution in #3157
- @Thiraput01 made their first contribution in #3115
- @andrewor14 made their first contribution in #2976
- @parth2510 made their first contribution in #3187
- @jerryzh168 made their first contribution in #3111
- @DefiWimar7 made their first contribution in #3224
Full Changelog: August-2025...August-2025-v2