What's Changed
Features
- feat(webui): organize
EvalOutputPromptDialog
and convert it to a drawer, by @typpo in #5619 - feat(webui): add keyboard navigation to the web UI results table, by @jameshiester in #5591
- feat(webui): enable bulk deletion of eval results, by @will-holley in #5438
- feat(providers): add
azure:responses
provider alias for Azure Responses API, by @mldangelo in #5293 - feat(providers): support application inference profiles in Bedrock, by @faizanminhas in #5617
- feat(redteam): add "layer" strategy for combining multiple strategies, by @typpo in #5606
- feat(redteam): set severity on reusable custom policies, by @will-holley in #5539
- feat(redteam): display unencrypted attacks in the web UI results table, by @addelong in #5565
- feat(redteam): enable test generation for custom policies in the plugins view, by @typpo in #5587
- feat(redteam): allow uploading CSVs for custom policies, by @typpo in #5618
- feat(cli): add ability to pause and resume evals, by @typpo in #5570
Fixes
- fix(assertions): handle
threshold=0
correctly across all assertion types, by @mldangelo in #5581 - fix(cli): prevent accidental escaping of Python path override, by @typpo in #5589
- fix(cli): fix table display for
promptfoo list
, by @typpo in #5616 - fix(cli): temporarily disable SIGINT handler, by @typpo in #5620
- fix(internal): strip authentication headers in HTTP provider metadata, by @typpo in #5577
- fix(redteam): ensure custom policies skip the basic refusal check, by @typpo in #5614
- fix(server): hide non-critical
hasModelAuditBeenShared
error logging, by @mldangelo in #5607 - fix(webui): always show failure reasons in the results view when available, by @addelong in #5608
- fix(webui): improve filter component styling and layout, by @mldangelo in #5604
- fix(webui): prevent phantom strategy filter options for non-redteam evaluations, by @mldangelo in #5575
- fix(webui): fix undulating CSS header animation, by @typpo in #5571
Chores
- chore(examples): update model IDs to GPT-5 and latest models, by @mldangelo in #5593
- chore(providers): remove Lambda Labs provider due to API deprecation, by @mldangelo in #5599
- chore(providers): update Cloudflare AI models and remove deprecated ones, by @mldangelo in #5590
- chore(redteam): add MCP plugin preset, by @faizanminhas in #5557
- chore(redteam): add UI indicators and documentation for HuggingFace gated datasets in redteam web UI, by @mldangelo in #5545
- chore(internals): improve error logging on redteam test generation failures, by @will-holley in #5458
- chore(internals): reduce log level of global fetch logs, by @faizanminhas in #5588
- chore(server): add context to health check logging during startup, by @mldangelo in #5568
- chore(webui): hide trace timeline section when no traces are available, by @mldangelo in #5582
- chore(webui): improve delete confirmation dialog styling, by @mldangelo in #5610
- chore(webui): remove
React.FC
type annotations for React 19 compatibility, by @mldangelo in #5572 - ci: increase test timeout from 8 to 10 minutes, by @mldangelo in #5586
- ci: temporarily disable macOS Node 24.x tests due to flaky failures, by @mldangelo in #5579
- refactor: move
src/util/file.node.ts
path utilities, by @mldangelo in #5596 - refactor: standardize all directory import paths for ESM compatibility, by @mldangelo in #5603
- refactor: standardize directory import paths for ESM compatibility, by @mldangelo in #5605
- refactor: standardize import paths for ESM preparation, by @mldangelo in #5600
- refactor: standardize TypeScript import paths for ESM compatibility, by @mldangelo in #5597
- test: CoverBot: add tests for UI interaction utilities and components (
src/app
), by @Use-Tusk[bot] in #5611 - chore: update
act
import for React 19 compatibility, by @mldangelo in #5574 - chore(dependencies): bump
@aws-sdk/client-bedrock-runtime
from 3.886.0 to 3.887.0, by @dependabot[bot] in #5580 - chore(dependencies): bump
@aws-sdk/client-bedrock-runtime
from 3.887.0 to 3.888.0, by @dependabot[bot] in #5602 - chore(dependencies): bump
axios
from 1.11.0 to 1.12.0 in npm_and_yarn group across one directory, by @dependabot[bot] in #5569 - chore(dependencies): bump
openai
from 5.20.1 to 5.20.2, by @dependabot[bot] in #5601 - chore(dependencies): bump
openai
from 5.20.2 to 5.20.3, by @dependabot[bot] in #5624 - chore(dependencies): bump version to 0.118.5, by @mldangelo in #5626
Documentation
- docs(site): clarify llm-rubric pass/score/threshold semantics, by @mldangelo in #5623
New Contributors
- @jameshiester made their first contribution in #5591
Full Changelog: 0.118.4...0.118.5