🔧 修复
- 修复数据集优化过程中意外生成 COT 的问题
- 修复文本处理页上传时已移除文件仍被处理致报错的问题
⚡ 优化
- 将本地文件存储重构为本地数据库存储,大幅优化大量数据下的使用体验
- 随机取出问题中的问号(支持配置)
- 优化多项功能使用体验
✨ 新功能
-
领域树灵活管理模式
-
多种文本分块策略
-
可视化自定义分块
-
客户端工具增强
- 新增本地日志存储,可一键打开日志目录排查问题
- 新增清除缓存功能,支持清理历史日志和数据库备份文件
🔧 Fixes
- Fixed the issue of accidental COT generation during dataset optimization.
- Fixed the error caused by processing removed files during upload on the text processing page.
⚡ Optimizations
- Refactored local file storage to local database storage, significantly improving performance with large datasets.
- Added configurable option to randomly remove question marks from generated questions.
- Enhanced user experience across multiple functions.
✨ New Features
-
Flexible Domain Tree Management
- Three modes for adding/deleting documents:
- Revise Mode: Only update domain tree nodes related to new/deleted documents, minimizing impact on existing structure.
- Rebuild Mode: Regenerate domain tree from all document catalogs (current logic).
- Lock Mode: Freeze domain tree, no updates triggered by document changes.
- Three modes for adding/deleting documents:
-
Multiple Text Chunking Strategies
- Markdown Chunking: Auto-split by document headings to preserve semantic integrity (for structured Markdown).
- Recursive Delimiter Chunking: Try multi-level delimiters recursively (configurable), ideal for complex documents.
- Fixed-Length Delimiter Chunking: Split by specified delimiter (configurable) and combine into fixed-length chunks.
- Token Chunking: Split based on token count (not character count) for model-friendly input.
- Code Intelligence Chunking: Smart splitting by programming language syntax to avoid incomplete code segments.
-
Visual Custom Chunking
- Manual adjustment of chunk boundaries via graphical interface with real-time preview.
-
Client Tool Enhancements
- Local log storage added, with one-click access to log directory for troubleshooting.
- Cache clearing function added to clean historical logs and database backups.


