Installer + Settings
- A new Settings UI allows you to configure third-party LLM services like OpenAI and Anthropic. These services can be used throughout the app for tasks like judging or data generation.
- For debugging purposes, you can view all active engine tasks in Settings → Jobs.
- The Computer Tab now includes a scrollbar when you have more than 3 GPUs, so you can see all your GPUs (must be nice!).
- Fixed a bug where having a .python_version file in your home directory would interfere with uv's Python version selection.
- The new AI Providers tab in Settings lets you add API keys for OpenAI, Anthropic, and custom providers:
- API keys now persist between app startups
- Custom Model Provider APIs can be configured here and used for evaluations or generation
Datasets
- We improved how datasets display, particularly those containing very large fields
- Datasets requiring a config_name now allow you to specify the config during download
Documents
- Fixed bugs affecting document deletion and updates for files stored in folders
- Added the ability to sort documents by name
- Improved the visual design of the Documents page
Evals
- Charting
- Visualize individual evals using line, bar, or radar graphs
- Compare multiple evals using interactive charts
- Generate detailed comparative reports between evals
- Enhanced chart visuals with improved legends and formatting
- Comparing Evals
- Compare results across multiple eval runs
- Generate detailed side-by-side comparisons for each test case, with downloadable CSV reports
- View summary charts comparing all selected eval jobs
- Display line graphs with flexible axis options—view by metric or by compared job
- New Eval Plugins:
- Basic Evals Plugin: Perform fundamental checks on model outputs, including JSON validation and bullet list detection
- Create custom evaluations using regex patterns or exact string matching
- Red Teaming Plugin: Test LLM vulnerabilities using DeepEval's capabilities across multiple categories: Bias, Misinformation, PII Leakage, Personal Safety, Toxicity, Robustness, Unauthorized Access, Illegal Activity, Graphic Content, and Intellectual Property Infringement
- Common EleutherAI Harness Plugin: Run key benchmarks using a streamlined version of the EleutherAI Harness plugin
- Basic Evals Plugin: Perform fundamental checks on model outputs, including JSON validation and bullet list detection
Training:
- Multi-GPU support
- Multi-GPU training support for Llama Trainer via dedicated plugin
- GRPO training support with multi-GPU configuration on a separate plugin
- Pre-training
- New pre-training plugin using Nanotron for LLMs, initially supporting LlamaForCausalLM config with planned expansion to MoE and Mamba (state-space) models
- Embedding Models: Fine-tune embedding models with comprehensive dataset types and loss functions
- Supports all dataset types and loss functions documented at https://sbert.net/docs/sentence_transformer/loss_overview.html
- Models are automatically stored locally and imported into Transformer Lab
- Support for FP16 and BF16 options
- Additional loss modifiers for model size and performance optimization
Backend:
- SQLAlchemy Integration
- Started migrating database access to SQLAlchemy, enabling future ORM capabilities and Alembic migrations
- Testing Infrastructure
- Implemented foundational unit tests using pytest
- Docker:
- Updated CPU and GPU Dockerfiles in main branch
- Published v0.10.2 CPU and GPU images to Transformer Lab dockerhub
- Updated docker-compose file for direct execution
- Verified CPU docker images on Mac, Ubuntu, and Windows
- Fixed local model import issue
- Enhanced API Security
- Strengthened API endpoints to ensure compliance with future security audits