如果遇到 Github 下载慢的问题可以使用网盘下载:https://pan.quark.cn/s/194b7eedf16e
🔧 修复
- 选择模型后刷新列表跨域问题 → 修复模型列表刷新时的跨域请求错误,确保不同域下模型数据正常加载。
- 上传 DOCX 文件处理超时 → 优化文件解析线程池配置,解决大文件处理时的超时异常。
- 删除文献时原始目录删除失败 → 修正文件系统操作逻辑,确保文献删除时关联的原始目录同步清理。
⚡ 优化
- Docker 打包脚本 → 优化镜像构建流程,减少冗余依赖,提升打包效率。
- 数据蒸馏任务问题生成 → 问题生成时不再包含标签序号,适配无结构化格式需求。
- 数据集详情 Token 展示 → 在数据集详情页新增 Token 数量统计,直观显示文本长度(支持模型输入限制参考)。
✨ 新功能
- GA(载体、受众)对的数据集增强
引入 “载体(Generator)- 受众(Audience)” 配对机制,根据数据应用场景生成针对性内容。
文档:https://docs.easy-dataset.com/jin-jie-shi-yong/mga-zeng-qiang-shu-ju-ji
🔧 Fixes
- Cross-origin issue when refreshing model list → Fixed cross-origin request errors to ensure model data loads properly across domains.
- Timeout when processing uploaded DOCX files → Optimized file parsing thread pool to resolve timeouts during large document handling.
- Failed deletion of original directory when removing literature → Corrected file system logic to ensure associated original directories delete with literature.
⚡ Optimizations
- Docker packaging script → Optimized image build process to reduce redundant dependencies and improve packaging efficiency.
- Question generation in data distillation tasks → Removed label indices (e.g., "Q1:", "A1:") from generated questions for unstructured format compatibility.
- Dataset details Token display → Added Token count statistics on dataset pages for clear text length visualization (supports model input limit reference).
✨ New Feature: GA (Generator-Audience) Pair Dataset Enhancement
Introduces "Generator-Audience" pairing to generate targeted content based on usage scenarios: