如果遇到 Github 下载慢的问题可以使用网盘下载:https://pan.quark.cn/s/194b7eedf16e
✨ 新功能
-
支持本地部署 MinerU 集成(#200、#245)
→ 可在任务设置中配置本地 MinerU 服务 URL,实现与本地部署的 MinerU 工具联动。 -
数据集增强管理功能(#81)
→ 新增数据集评分、自定义标签及备注功能,支持基于这些属性进行筛选查询。 -
文献内容清洗功能(#516)
→ 支持对原始文献内容进行预处理清洗,提升后续数据集生成质量;支持自定义数据清洗提示词,适配不同场景需求。 -
数据集导出选项扩展
-
文献格式支持扩展(#205)
→ 新增对 .epub 格式文献的上传与分析功能,拓宽文献处理范围。 -
数据集导入功能(#498)
→ 支持从本地文件导入已有数据集,快速复用外部数据资源。
⚡ 优化
🔧 修复
-
超大数据集导出问题(#502)
→ 修复大规模数据集导出时的卡死问题,新增分批导出机制,提升稳定性。 -
项目间问题冲突(#509)
→ 修复不同项目中问题 DIFF 对比时出现的冲突异常,确保跨项目数据一致性。
✨ New Features
-
Support for Local MinerU Deployment(#200、#245)
→ Allows configuration of local MinerU service URL in task settings, enabling integration with locally deployed MinerU tools. -
Enhanced Dataset Management(#81)
→ Added dataset rating, custom tags, and notes functions, with support for filtering based on these attributes. -
Literature Content Cleaning(#516)
→ Supports preprocessing and cleaning of original literature content to improve subsequent dataset quality; allows custom data cleaning prompts for different scenarios. -
Extended Dataset Export Options
-
Expanded Literature Format Support(#205)
→ Added support for uploading and analyzing .epub format documents, broadening literature processing scope. -
Dataset Import Function(#498)
→ Supports importing existing datasets from local files for quick reuse of external data resources.
⚡ Optimizations
-
Dataset Pagination Improvement(#497)
→ Automatically saves the selected state of Markdown tags during pagination to avoid repeated operations. -
Dataset List Filter Enhancement(#275)
→ Added filtering for "whether it is a distilled dataset" to quickly locate specific data types.
🔧 Fixes
-
Large Dataset Export Issue(#502)
→ Fixed freezing when exporting large-scale datasets; added batch export mechanism to improve stability. -
Cross-Project Question Conflicts(#509)
→ Resolved conflict anomalies in question DIFF comparisons between different projects, ensuring cross-project data consistency.