github hiyouga/LLaMA-Factory v0.9.2
v0.9.2: MiniCPM-o, SwanLab, APOLLO

one day ago

This is the last version before LLaMA-Factory v1.0.0. We are working hard to improve the efficiency and availability.

We will attend the vLLM Beijing Meetup on Mar 16th! See you in Beijing 👋

New features

New models

  • Base models
    • GPT2 (0.1B/0.4B/0.8B/1.5B) 📄
    • Granite 3.0-3.1 (1B/2B/3B/8B) 📄
    • PaliGemma2 (3B/10B/28B) 📄🖼️
    • Moonlight (16B) 📄
    • DeepSeek V2-V2.5 Base (236B) 📄
    • DeepSeek V3 Base (671B) 📄
  • Instruct/Chat models
    • Granite 3.0-3.1 (1B/2B/3B/8B) by @Tuyohai in #5922 📄🤖
    • DeepSeek R1 (1.5B/7B/8B/14B/32B/70B/671B) by @Qwtdgh in #6767 📄🤖
    • TeleChat2 (3B/7B/12B/35B/115B) @ge-xing in #6313 📄🤖
    • Qwen2.5-VL (3B/7B/72B) by @hiyouga in #6779 📄🤖🖼️
    • PaliGemma2-mix (3B/10B/28B) by @Kuangdd01 in #7060 📄🤖🖼️
    • Qwen2 Audio (7B) by @BUAADreamer in #6701 📄🤖🔈
    • MiniCPM-V/MiniCPM-o (8B) by @BUAADreamer in #6598 and #6631 📄🤖🖼️🔈
    • InternLM3-Instruct (8B) by @hhaAndroid in #6640 📄🤖
    • Marco-o1 (8B) 📄🤖
    • Skywork-o1 (8B) 📄🤖
    • Phi-4 (14B) 📄🤖
    • Moonlight Instruct (16B) 📄
    • Mistral Small (24B) 📄🤖
    • QwQ (32B) 📄🤖
    • Llama-3.3-Instruct (70B) 📄🤖
    • QvQ (72B) 📄🤖🖼️
    • DeepSeek V2-V2.5 (236B) 📄🤖
    • DeepSeek V3 (671B) 📄🤖

New datasets

  • Supervised fine-tuning datasets
    • OpenO1 (en) 📄
    • Open Thoughts (en) 📄
    • Open-R1-Math (en) 📄
    • Chinese-DeepSeek-R1-Distill (zh) 📄

Changes

Bug fix

Full Changelog: v0.9.1...v0.9.2

Don't miss a new LLaMA-Factory release

NewReleases is sending notifications on new releases.