hiyouga/LlamaFactory v0.9.5 on GitHub

Added primary support for Qwen3.5/Qwen3.6/Gemma4 models and compatibility with Transformers v5.

What's Changed

[misc] set dev version by @hiyouga in #9703
fix(fp8): add Transformer Engine backend support by @sbhavani in #9705
[misc] Compatible with an empty architectures field in config.json by @tangefly in #9709
[model] support Youtu-LLM-2B by @isLinXu in #9707
[misc] lint by @hiyouga in #9710
Update pyproject.toml and requirements by @jiaqiw09 in #9714
[v1] add init plugin by @hiyouga in #9716
[misc] Add a PyTorch version warning for Conv3D. by @tangefly in #9715
[feature] add support for EAFT loss by @ymxyll in #9720
[v1] add cli sampler by @hiyouga in #9721
[v1] add renderer ut by @hiyouga in #9722
Update README.md by @tangefly in #9724
[CI]improve cuda ci cache by @frozenleaves in #9725
Add support for LiquidAI's LFM2.5 (Liquid Foundation Models) to LLaMA-Factory. by @vovanphuc in #9726
Add support for LiquidAI's LFM2.5-VL vision-language model by @vovanphuc in #9729
[misc] fix parser by @hiyouga in #9730
[refactor] rename lfm template to lfm2 and add LFM 2.5 to README by @vovanphuc in #9731
[fix] correct ktransformers example config paths and templates by @JimmyPeilinLi in #9732
[model] support for microsoft's Phi-4-mini by @ctx289 in #9734
[misc] fix fp8 by @hiyouga in #9742
[v1] add batch generator by @hiyouga in #9744
[deps] fix package by @hiyouga in #9745
[model] support HY-MT model by @isLinXu in #9746
[v1] upgrade batching by @hiyouga in #9751
[model] fixed&added Hunyuan models by @isLinXu in #9750
[v1] add sft by @hiyouga in #9752
using mp to run kernel test by @frozenleaves in #9754
[v1] fix kernel moe patch by @jiaqiw09 in #9867
[misc] update mcore related docker and mca supported models by @Kuangdd01 in #10114
[feat] support all_exhausted_without_replacement in datasets.interleave_datasets by @Moenupa in #10112
chore: Update outdated GitHub Actions versions by @pgoslatara in #10123
[v1] support training with fsdp2 by @frozenleaves in #9773
[v0] Fix reward model training safetensors saving by @jiaqiw09 in #10137
Fix : add visual.pos_embed to Qwen3-VL visual model keys by @je1lee in #10139
[feature] support using ray.remote to start distributed training. by @xvxuopop in #10109
update peft, deepspeed, adapt transformers v5 by @frozenleaves in #10147
[model] support youtu-vl model by @isLinXu in #10152
Fix race condition in LoggerHandler during multi-GPU training by @yurekami in #10156
[assets] update readme by @hiyouga in #10159
[model] support MiniCPM-o-4.5 by @isLinXu in #10163
add dpo/kto fsdp fsdp2 support by @UsernameFull in #10127
[model] support GLM-4.7-Flash SFT by @Shanay-Mehta in #10173
[v1] init commit for v1 docs by @frozenleaves in #10145
[model] support GLM-OCR SFT by @Ataraxy33 in #10183
[model] add liger kernel support for Qwen3-Next by @Shanay-Mehta in #10176
[V1] Add v1 LoRA/Freeze support and merge workflow by @jiaqiw09 in #10157
Add ASFT by @susjunyou in #10174
[V1] support deepspeed by @frozenleaves in #10181
[v1] support quantization by @sunyi0505 in #10161
[v0/v1] fix ut huggingface hub 429 error when transformers>=5.0.0 by @jiaqiw09 in #10155
[mca] update supported models by @Kuangdd01 in #10196
fix: remove safe_serialization arg for transformers v5 compatibility by @Alm0stSurely in #10208
Add DeepSpeed Z3 leaf module for Qwen3-Next by @Shanay-Mehta in #10194
[model] Adapt Qwen3.5 by @frozenleaves in #10213
[model] update constants by @hiyouga in #10220
[model] support Aeva by @louzongzhi in #10214
upgrade to ROCm 7.2 base image, drop PyTorch reinstall by @mjkvaak-amd in #10223
[fix] register visual part for Qwen3.5 by @Kuangdd01 in #10227
[V1] add seed for training and fix gradient checkpointing by @jiaqiw09 in #10211
fix(vllm): support mixed multimodal payloads by @phiott in #10225
[misc] fix constants by @hiyouga in #10232
Add Trackio Integration for LlamaFactory by @ParagEkbote in #10165
[model] support Qwen3.5 all series models by @isLinXu in #10237
fix: qwen3.5 projector path by @LittleYanlin in #10242
fix: get ray head ip by @SnowCharmQ in #10252
[V1] Support meta loading for full and free by @jiaqiw09 in #10236
fix: Fix compatibility issue with HuggingFace Dataset Column when sav… by @pyxnpyx in #10254
docs: fix Python version requirement from 3.10 to >=3.11.0 by @ll0v0ll in #10259
fix: convert filter() to list in read_cloud_json to fix broken empty-check by @jnMetaCode in #10260
[mca] support qwen3.5 by @Kuangdd01 in #10265
fix(mm): fallback to audio_processor when feature_extractor is missing by @xxddccaa in #10267
update npu docker by @frozenleaves in #10268
fix(template): correct gpt_oss format_assistant by @RuijieH in #10269
fix: make position_id_per_seconds configurable for Qwen2OmniPlugin by @LincolnBurrows2017 in #10281
fix: unused keys in ray example by @SnowCharmQ in #10290
[v1] add qwen3 templates and fix rendering plugin. by @xvxuopop in #10212
fix: handle empty content list in system message by @LincolnBurrows2017 in #10291
fix(MiniCPMVPlugin): fix IndexError in process_messages when training with video by @xxddccaa in #10276
feat(data): add SGSC zero-hallucination B2B dataset (NOO-Protocol) by @robertglools in #10284
[fix] fit neat_packing & mrope model packing by @Kuangdd01 in #10283
chore: mca workflow compatible with qwen-vl series by @Kuangdd01 in #10303
[liger_kernel] support Qwen3.5. by @wyt2000 in #10313
fix: mimo-v2 tool call by @isLinXu in #10315
[v1] add callbacks by @jiaqiw09 in #10255
ci: add nginx cache config for Ascend NPU CI environment by @Goalina in #10323
[V1]add init on rank0 for fsdp2 by @jiaqiw09 in #10264
[v1] support ulysses cp for fsdp2 by @sunyi0505 in #10262
[feat] support LlamaFactory SFT training by HyperParallel FSDP2 backend by @Cui-yshoho in #10289
fix moe by @frozenleaves in #10334
fix: qwen3vl timestamp by @Kuangdd01 in #10338
[model] gemma4 by @Kuangdd01 in #10346
fix: gemma4 mm_token_type_ids padding by @Kuangdd01 in #10359
fix: set mm_projectors for omni models by @Kuangdd01 in #10378
fix: projector lookup for gemma4 modules by @Kuangdd01 in #10382
fix(data): SeedToolUtils.tool_extractor returns content when no tool calls found by @kuishou68 in #10408
[V1]support resume training from checkpoint by @frozenleaves in #10280
[v1] fix device mesh and clip_grad_norm for ulysses cp by @sunyi0505 in #10366
[v1] add deepspeed zero3 trigger for low memory usage weight loading by @jiaqiw09 in #10300
support qwen3.6 models by @frozenleaves in #10415
[v1] fix epoch and steps by @jiaqiw09 in #10422
[packing] add qwen35 patch for neat_packing by @Kuangdd01 in #10436
[data] support discard history cot for multiturn by @Kuangdd01 in #10435
[v1] fix init on meta in transformers v5 by @jiaqiw09 in #10414
[misc] code lint by @Kuangdd01 in #10439
feat(npu): add Qwen3.5 support with Partial RoPE and Hybrid Attention by @curnane-lab in #10421
fix: handle NotImplementedError in export_model for transformers>=5.0 (fixes #10410) by @octo-patch in #10438
[v1] fix device_mesh and sp for fsdp2 by @sunyi0505 in #10429
[fa2] fix IMA when train qwen3_5 by @Kuangdd01 in #10448
[model] support Hy3-Preview by @isLinXu in #10432
[misc] bump transformers version upperbound by @Kuangdd01 in #10446
Add KTransformers AMX MoE SFT support via Accelerate by @JimmyPeilinLi in #10430
[packing] fix gdn crash when meeting dummy image by @Kuangdd01 in #10453
Optimize Qwen video token metadata preprocessing by @luca-888 in #10404
fix(docs): correct typo in examples/README_zh.md by @simulikeit in #10462
fix(data/converter): handle None tool_calls in OpenAI-style messages by @Anai-Guo in #10455
[fix] fix qwen3_6 template doc by @frozenleaves in #10470
[model] support MiniCPM-V-4.6 by @tsjyma in #10472
[fix] Fix MiniCPM-V-4.6 image preprocessing behavior by @tsjyma in #10478
[docker] update npu docker by @xvxuopop in #10479
Fix: add missing return statement in MiniCPMVPlugin.get_mm_inputs by @ZMXJJ in #10500
[V1] support reward training stage by @frozenleaves in #10431
add torch profiler callback by @frozenleaves in #10463
[V1] add cuda fused moe kernel, implementing with triton by @frozenleaves in #10481
[v1] support liger_kernel by @sunyi0505 in #10493
[v1] Add FlashAttention selection and implement normal / padding-free / dynamic batching by @jiaqiw09 in #10469
fix: use getattr for profiler attrs to support MCA TrainingArguments by @Copilot in #10506
[v1] Implement dynamic padding-free stretrgy for batching by @XuanyuChen-SEU in #10507
[v1] fix padding free with sp by @jiaqiw09 in #10513
[v0] fix non-packing batch (bsz>1) for Qwen3.5 with flash attention by @jiaqiw09 in #10529
[fix] Fix NPU FusedMoE and RMSNorm by @xvxuopop in #10512
[version] release v0.9.5 by @hiyouga in #10532

New Contributors

@sbhavani made their first contribution in #9705
@ymxyll made their first contribution in #9720
@vovanphuc made their first contribution in #9726
@ctx289 made their first contribution in #9734
@Moenupa made their first contribution in #10112
@pgoslatara made their first contribution in #10123
@je1lee made their first contribution in #10139
@yurekami made their first contribution in #10156
@Shanay-Mehta made their first contribution in #10173
@Ataraxy33 made their first contribution in #10183
@susjunyou made their first contribution in #10174
@Alm0stSurely made their first contribution in #10208
@louzongzhi made their first contribution in #10214
@mjkvaak-amd made their first contribution in #10223
@phiott made their first contribution in #10225
@ParagEkbote made their first contribution in #10165
@LittleYanlin made their first contribution in #10242
@SnowCharmQ made their first contribution in #10252
@pyxnpyx made their first contribution in #10254
@ll0v0ll made their first contribution in #10259
@jnMetaCode made their first contribution in #10260
@xxddccaa made their first contribution in #10267
@RuijieH made their first contribution in #10269
@LincolnBurrows2017 made their first contribution in #10281
@robertglools made their first contribution in #10284
@wyt2000 made their first contribution in #10313
@Goalina made their first contribution in #10323
@Cui-yshoho made their first contribution in #10289
@kuishou68 made their first contribution in #10408
@curnane-lab made their first contribution in #10421
@octo-patch made their first contribution in #10438
@luca-888 made their first contribution in #10404
@simulikeit made their first contribution in #10462
@Anai-Guo made their first contribution in #10455
@tsjyma made their first contribution in #10472
@XuanyuChen-SEU made their first contribution in #10507

Full Changelog: v0.9.4...v0.9.5

hiyouga/LlamaFactory v0.9.5 v0.9.5: Qwen3.5/3.6, Gemma 4, Transformers v5 on GitHub

What's Changed

New Contributors

hiyouga/LlamaFactory v0.9.5
v0.9.5: Qwen3.5/3.6, Gemma 4, Transformers v5

on GitHub