如果遇到 Github 下载慢的问题可以使用网盘下载:https://pan.quark.cn/s/194b7eedf16e
✨ 新功能
 
 
- 问题模版功能
 → 可创建多种自定义问题类型(如“描述图像内容”“分析文本观点”),并应用于所有图像或文本块批量生成对应问题,提升问题生成的标准化与场景适配性。
 
- 支持更改蒸馏标签名称(#422)
 → 允许自定义蒸馏过程中生成的标签名称,适配不同场景下的标签管理需求。
🔧 修复
- 
修复保存模型时 ModelId 更新错误的 Bug 
 → 修正模型配置保存流程中 ModelId 字段同步异常的问题,确保模型标识唯一性。
- 
修复数据集批量评估问题(#576) 
 → 新增批量评估任务中断功能,支持手动终止正在执行的评估;优化评估算法,提升批量处理速度。
- 
修复数据集快捷键导致输入中断(#578) 
 → 调整快捷键触发逻辑,避免与文本输入操作冲突,确保输入过程不被意外打断。
- 
修复大量数据集选择后导出失败(#578) 
 → 优化导出任务分片机制,解决因数据量过大导致的内存溢出或连接超时问题。
- 
修复平衡导出不生效(#561) 
 → 修正平衡导出逻辑中样本分布计算错误,确保按预设比例导出不同类别数据。
- 
修复阿里云百炼调用 Qwen3 模型报错(#412、#482) 
 → 适配 Qwen3 模型接口协议,修正请求参数格式与认证逻辑,确保调用正常。
⚡ 优化
- 
提升多轮对话数据集解析稳定性 
 → 增强对多轮对话格式(如 ShareGPT)的兼容解析,减少因格式变体导致的解析失败。
- 
异步执行单个文本块操作(#530、#494) 
 → 将“单个文本块生成问题”“AI 智能优化数据集”改为后台异步任务,执行时不阻塞前端其他操作。
- 
文本块筛选增强(#541) 
 → 支持按关键字搜索文本块内容,及按字数范围(如 100-500 字)筛选,快速定位目标文本。
- 
模型配置支持 Top 参数控制(#517) 
 → 模型配置页新增 Top 参数(如 Top-K/Top-P)设置,可调节生成内容的多样性与确定性。
- 
按文本块名称筛选(#275) 
 → 问题列表与数据集列表支持按关联文本块(文件)名称筛选,提升跨模块数据定位效率。
🔧 Fixes
- 
Fixed ModelId update error when saving models 
 → Corrected the abnormal synchronization of the ModelId field during model configuration saving to ensure unique model identification.
- 
Fixed issues with batch dataset evaluation(#576) 
 → Added the ability to interrupt batch evaluation tasks, supporting manual termination of ongoing evaluations; optimized evaluation algorithms to improve batch processing speed.
- 
Fixed input interruption caused by dataset shortcuts(#578) 
 → Adjusted shortcut trigger logic to avoid conflicts with text input operations, ensuring input processes are not accidentally interrupted.
- 
Fixed export failure when selecting large numbers of datasets(#578) 
 → Optimized the export task sharding mechanism to resolve memory overflow or connection timeout issues caused by excessive data volume.
- 
Fixed ineffective balanced export(#561) 
 → Corrected sample distribution calculation errors in balanced export logic to ensure data of different categories are exported according to preset ratios.
- 
Fixed errors when calling Qwen3 model via Alibaba Cloud Bailian(#412、#482) 
 → Adapted to Qwen3 model interface protocols, corrected request parameter formats and authentication logic to ensure normal calls.
⚡ Optimizations
- 
Improved stability of multi-turn dialogue dataset parsing 
 → Enhanced compatible parsing of multi-turn dialogue formats (e.g., ShareGPT) to reduce parsing failures caused by format variations.
- 
Asynchronous execution of single text block operations(#530、#494) 
 → Changed "generate questions for single text blocks" and "AI intelligent dataset optimization" to background asynchronous tasks, which do not block other front-end operations during execution.
- 
Enhanced text block filtering(#541) 
 → Supports filtering text blocks by keyword search and word count range (e.g., 100-500 words) for quick定位 of target text.
- 
Model configuration supports Top parameter control(#517) 
 → Added Top parameter (e.g., Top-K/Top-P) settings on the model configuration page to adjust the diversity and determinism of generated content.
- 
Filter by text block name(#275) 
 → Question lists and dataset lists support filtering by associated text block (file) names, improving cross-module data positioning efficiency.
✨ New Features
- 
Fully automated dataset distillation background tasks(#432、#492、#495、#496) 
 → Supports full-process automation from triggering distillation to dataset generation, completed via background asynchronous tasks without manual intervention, with real-time progress tracking.
- 
Support for renaming distillation labels(#422) 
 → Allows custom naming of labels generated during distillation to adapt to label management needs in different scenarios.
- 
Generate Visual Question Answering (VQA) datasets(#130、#483、#537) 
 → Supports uploading image files, automatically generating image-related questions and answers to build VQA datasets, suitable for vision-language model training.
- 
Question template function 
 → Enables creation of multiple custom question types (e.g., "describe image content", "analyze text opinions") and applies them to all images or text blocks to generate corresponding questions in batches, improving standardization and scenario adaptability of question generation.