What's Changed
Features
- feat(webui): filter eval results by metric values with numeric operators (EQ, GT, LTE, etc.) by @will-holley in #6011
- feat(providers): 10-100x performance improvement for Python providers with persistent worker pools by @mldangelo in #5968
- feat(providers): add OpenAI Agents SDK integration with support for agents, tools, and handoffs by @mldangelo in #6009
- feat(providers): add function calling/tool support for Ollama by @mldangelo in #5977
- feat(providers): add support for Claude Haiku 4.5 by @jameshiester in #5937
- feat(redteam): add jailbreak:meta strategy with intelligent attack taxonomy learning by @MrFlounder in #6021
- feat(redteam): add COPPA plugin by @typpo in #5997
- feat(redteam): add GDPR preset mappings by @typpo in #5986
- feat(redteam): add modifiers support to iterative strategies by @MrFlounder in #5972
- feat(redteam): add authoritative markup injection strategy by @typpo in #5961
- feat(redteam): add wordplay plugin by @typpo in #5889
- feat(redteam): add Simba red team agent strategy by @sklein12 in #5795
- feat(redteam): add subcategory filtering to BeaverTails plugin by @typpo (a70372f)
- feat(redteam): include pluginId, strategyId, and sessionId in CSV exports by @sklein12 in #6016
- feat(webui): persist custom policy names by @will-holley in #5990
- feat(webui): show target responses for red team test cases by @will-holley in #5869
- feat(cli): log errors to file with console messages by @sklein12 in #5992
- feat(cli): show errors in eval progress bar by @sklein12 in #5942
- feat(cache): display latency measurements for cached responses by @mldangelo in #5978
Fixes
- fix(providers): restore runtime variable substitution in templates by @mldangelo (5423f80)
- fix(providers): improve Python provider reliability with automatic python3/python detection and better error handling by @mldangelo in #6034
- fix(providers): simulated-user and mischievous-user now respect system prompts in multi-turn conversations by @mldangelo in #6020
- fix(providers): improve MCP tool schema compatibility with OpenAI by @mldangelo in #5965
- fix(providers): properly store sessionId in metadata by @sklein12 in #6016
- fix(redteam): skip session management tests for stateless targets by @faizanminhas in #5989
- fix(redteam): improve Crescendo strategy accuracy by @jameshiester in #5964
- fix(redteam): reduce duplicate error messages for invalid strategy and plugin ids by @typpo in #5954
- fix(fetch): improve retry counter messages and error details by @LizzHale in #6017, in #6019
- fix(webui): pass extensions config when running evals by @theLucasAntunes in #6006
- fix(webui): fix visibility of reset config button in red team setup by @will-holley in #5896
- fix(webui): sync selected plugins to global config by @will-holley in #5991
- fix(webui): fix HTTP test agent by @faizanminhas in #6033
- fix(webui): reset strategy config dialog when switching strategies by @sklein12 in #6035
Chores
- chore(webui): improve red team UI with disabled state indicators and better organization by @typpo, @faizanminhas, @will-holley in #5985, in #5970, in #5962, in #5865
- chore(cli): add telemetry status to debug output by @typpo in #6015
- chore(redteam): improve GOAT and Crescendo error messages by @sklein12 in #6036
- chore(deps): update AWS SDK, Anthropic SDK, and other dependencies in #6008, in #5996, in #5975, in #5945, in #5944
Documentation
- docs(model-audit): improve ModelAudit documentation by @mldangelo in #6023
- docs(providers): add OpenAI Agents provider documentation by @mldangelo in #6009
- docs(providers): update AWS Bedrock and provider documentation by @mldangelo in #5953, in #6018, in #5941
- docs(blog): add RLVR blog post by @mldangelo in #5987
- docs(site): add export formats, inference configuration, and September release notes by @typpo, @vsauter, @ladyofcode in #5958, in #5983, in #5712
- docs(examples): add session id management example by @will-holley in #5940
Tests
- test: add comprehensive tests for providers, red team, and UI components by @will-holley, @mldangelo in #6031, #6026, #6020, #6009, #5981
New Contributors
- @LizzHale made their first contribution in in #6019
- @theLucasAntunes made their first contribution in in #6006
Full Changelog: 0.118.17...0.119.0