github promptfoo/promptfoo 0.118.5

13 hours ago

What's Changed

Features

  • feat(webui): organize EvalOutputPromptDialog and convert it to a drawer, by @typpo in #5619
  • feat(webui): add keyboard navigation to the web UI results table, by @jameshiester in #5591
  • feat(webui): enable bulk deletion of eval results, by @will-holley in #5438
  • feat(providers): add azure:responses provider alias for Azure Responses API, by @mldangelo in #5293
  • feat(providers): support application inference profiles in Bedrock, by @faizanminhas in #5617
  • feat(redteam): add "layer" strategy for combining multiple strategies, by @typpo in #5606
  • feat(redteam): set severity on reusable custom policies, by @will-holley in #5539
  • feat(redteam): display unencrypted attacks in the web UI results table, by @addelong in #5565
  • feat(redteam): enable test generation for custom policies in the plugins view, by @typpo in #5587
  • feat(redteam): allow uploading CSVs for custom policies, by @typpo in #5618
  • feat(cli): add ability to pause and resume evals, by @typpo in #5570

Fixes

  • fix(assertions): handle threshold=0 correctly across all assertion types, by @mldangelo in #5581
  • fix(cli): prevent accidental escaping of Python path override, by @typpo in #5589
  • fix(cli): fix table display for promptfoo list, by @typpo in #5616
  • fix(cli): temporarily disable SIGINT handler, by @typpo in #5620
  • fix(internal): strip authentication headers in HTTP provider metadata, by @typpo in #5577
  • fix(redteam): ensure custom policies skip the basic refusal check, by @typpo in #5614
  • fix(server): hide non-critical hasModelAuditBeenShared error logging, by @mldangelo in #5607
  • fix(webui): always show failure reasons in the results view when available, by @addelong in #5608
  • fix(webui): improve filter component styling and layout, by @mldangelo in #5604
  • fix(webui): prevent phantom strategy filter options for non-redteam evaluations, by @mldangelo in #5575
  • fix(webui): fix undulating CSS header animation, by @typpo in #5571

Chores

  • chore(examples): update model IDs to GPT-5 and latest models, by @mldangelo in #5593
  • chore(providers): remove Lambda Labs provider due to API deprecation, by @mldangelo in #5599
  • chore(providers): update Cloudflare AI models and remove deprecated ones, by @mldangelo in #5590
  • chore(redteam): add MCP plugin preset, by @faizanminhas in #5557
  • chore(redteam): add UI indicators and documentation for HuggingFace gated datasets in redteam web UI, by @mldangelo in #5545
  • chore(internals): improve error logging on redteam test generation failures, by @will-holley in #5458
  • chore(internals): reduce log level of global fetch logs, by @faizanminhas in #5588
  • chore(server): add context to health check logging during startup, by @mldangelo in #5568
  • chore(webui): hide trace timeline section when no traces are available, by @mldangelo in #5582
  • chore(webui): improve delete confirmation dialog styling, by @mldangelo in #5610
  • chore(webui): remove React.FC type annotations for React 19 compatibility, by @mldangelo in #5572
  • ci: increase test timeout from 8 to 10 minutes, by @mldangelo in #5586
  • ci: temporarily disable macOS Node 24.x tests due to flaky failures, by @mldangelo in #5579
  • refactor: move src/util/file.node.ts path utilities, by @mldangelo in #5596
  • refactor: standardize all directory import paths for ESM compatibility, by @mldangelo in #5603
  • refactor: standardize directory import paths for ESM compatibility, by @mldangelo in #5605
  • refactor: standardize import paths for ESM preparation, by @mldangelo in #5600
  • refactor: standardize TypeScript import paths for ESM compatibility, by @mldangelo in #5597
  • test: CoverBot: add tests for UI interaction utilities and components (src/app), by @Use-Tusk[bot] in #5611
  • chore: update act import for React 19 compatibility, by @mldangelo in #5574
  • chore(dependencies): bump @aws-sdk/client-bedrock-runtime from 3.886.0 to 3.887.0, by @dependabot[bot] in #5580
  • chore(dependencies): bump @aws-sdk/client-bedrock-runtime from 3.887.0 to 3.888.0, by @dependabot[bot] in #5602
  • chore(dependencies): bump axios from 1.11.0 to 1.12.0 in npm_and_yarn group across one directory, by @dependabot[bot] in #5569
  • chore(dependencies): bump openai from 5.20.1 to 5.20.2, by @dependabot[bot] in #5601
  • chore(dependencies): bump openai from 5.20.2 to 5.20.3, by @dependabot[bot] in #5624
  • chore(dependencies): bump version to 0.118.5, by @mldangelo in #5626

Documentation

  • docs(site): clarify llm-rubric pass/score/threshold semantics, by @mldangelo in #5623

New Contributors

Full Changelog: 0.118.4...0.118.5

Don't miss a new promptfoo release

NewReleases is sending notifications on new releases.