github promptfoo/promptfoo 0.117.0

latest releases: 0.118.3, 0.118.2, 0.118.1...
one month ago

What's Changed

Features

Fixes

  • fix(cli): --filter-failing not working with custom providers by @mldangelo in #4911
  • fix(google-sheets): replace hardcoded range with dynamic approach by @mldangelo in #4822
  • fix(internal): fixes filtering by metric keys which contain dots by @will-holley in #4964
  • fix(providers): add thinking token tracking for Google Gemini models by @mldangelo in #4944
  • fix(providers): esm provider loading by @will-holley in #4915
  • fix(providers): implement callEmbeddingApi for LiteLLM embedding provider by @mldangelo in #4952
  • fix(redteam): prevent redteam run from hanging when using an mcp client by @faizanminhas in #4924
  • fix(redteam): respect PROMPTFOO_DISABLE_REDTEAM_REMOTE_GENERATION for cloud users by @mldangelo in #4839
  • fix(redteam): set pluginId on eval results by @sklein12 in #4928
  • fix(redteam): test target in http provider setup with non-200 status codes by @faizanminhas in #4932
  • fix(webui): eval results table horizontal scrolling by @will-holley in #4826
  • fix(webui): fix hard-coded light mode colors in model audit interface by @mldangelo in #4907
  • fix(webui): handle null table.body in DownloadMenu disabled prop by @mldangelo in #4913
  • fix(webui): resolve pagination scrolling and layout issues in ResultsTable by @mldangelo in #4943
  • fix(webui): scrolling when tbody is outside of viewport by @will-holley in #4948

Chores

  • chore(knip): integrate knip for unused code detection and clean up codebase by @faizanminhas in #4464
  • chore(linting): migrate from ESLint + Prettier to Biome by @mldangelo in #4903
  • chore(assertions): additional checking on llm-rubric response by @sklein12 in #4954
  • chore(assertions): include reason in model-graded-closedqa pass reason by @typpo in #4931
  • chore(build): resolve build warnings and optimize bundle size by @mldangelo in #4895
  • chore(csv): improve __metadata warning message and test coverage by @mldangelo in #4842
  • chore(providers): improve guardrails handling in Azure providers by @will-holley in #4788
  • chore(redteam): add domain-specific risks section and reduce verbose descriptions by @typpo in #4879
  • chore(release): bump version 0.117.0 by @mldangelo in #4963
  • chore(server): check if server is already running before starting by @mldangelo in #4896
  • chore(server): log correct eval ID instead of description in WebSocket updates by @mldangelo in #4910
  • chore(telemetry): add telemetry logging when tracing is enabled by @typpo in #4925
  • chore(types): typings needed for enterprise by @sklein12 in #4955
  • chore(vscode): use Biome as default formatter of TS files in vscode by @will-holley in #4920
  • chore(webui): conditionally render metrics selector by @will-holley in #4936
  • chore(webui): display context values in eval results by @will-holley in #4856
  • chore(webui): improves eval results table spacing by @will-holley in #4965
  • chore(webui): revert eval view ui improvements by @sklein12 in #4967
  • chore(webui/eval): allow filtering results by >1 metrics simultaneously (disabled by default) by @will-holley in #4870

Dependencies

  • chore(deps): add overrides to fix build issues by @sklein12 in #4957
  • chore(deps): bump @aws-sdk/client-bedrock-runtime from 3.842.0 to 3.844.0 by @dependabot[bot] in #4850
  • chore(deps): bump aiohttp from 3.11.11 to 3.12.14 in /examples/redteam-langchain in the pip group across 1 directory by @dependabot[bot] in #4922
  • chore(deps): bump openai from 5.8.3 to 5.9.0 by @dependabot[bot] in #4863
  • chore(deps): bump openai from 5.9.2 to 5.10.1 by @dependabot[bot] in #4961
  • chore(deps): move knip to dev dependencies by @faizanminhas in #4958
  • chore(deps): npm audit fix by @mldangelo in #4962
  • chore(deps): test removing knip to resolve installation errors by @sklein12 in #4956
  • chore(deps): update all example dependencies to latest versions by @mldangelo in #4900
  • chore(deps): update dependencies to latest minor/patch versions by @mldangelo in #4899
  • chore(deps): update non-breaking dependencies by @mldangelo in #4935
  • chore(deps): update Jest to version 30 by @mldangelo in #4939

Refactors

Tests

  • test(core): coverBot: added tests for core UI components and user context hooks (src/app) by @Use-Tusk[bot] in #4929
  • test(EnterpriseBanner): add unit tests for EnterpriseBanner component by @Use-Tusk[bot] in #4919
  • test(redteam): add unit test for src/redteam/remoteGeneration.ts by @gru-agent[bot] in #4834
  • test(server): fix flaky server share tests by @mldangelo in #4942
  • test(server): fix flaky server tests by @mldangelo in #4968
  • test(server): mock database in server tests by @mldangelo in #4959
  • test(tusk): update Tusk test runner workflow - coverage script by @Use-Tusk[bot] in #4921

Docs

  • docs(analytics): add google tag manager by @typpo in #4904
  • docs(api): improves contextTransform documentation by @will-holley in #4854
  • docs(assertions): add missing deterministic assertions by @mldangelo in #4891
  • docs(azure): improve Azure provider documentation by @mldangelo in #4836
  • docs(blog): add blog image generation script by @mldangelo in #4945
  • docs(blog): add truncation markers to articles without them by @mldangelo in #4934
  • docs(blog): add truncation markers to blog posts by @mldangelo in #4906
  • docs(blog): mcp proxy blog by @sklein12 in #4860
  • docs(blog): revise article tags by @mldangelo in #4949
  • docs(blog): soc2 type ii and iso 27001 blog by @vsauter in #4880
  • docs(comparison): pyrit comparison by @typpo in #4679
  • docs(config): clarify PROMPTFOO_EVAL_TIMEOUT_MS and PROMPTFOO_MAX_EVAL_TIME_MS descriptions by @mldangelo in #4947
  • docs(enterprise): adaptive guardrails enterprise by @typpo in #4951
  • docs(events): blackhat landing page by @typpo in #4862
  • docs(events): defcon landing page by @typpo in #4864
  • docs(events): events banner by @typpo in #4867
  • docs(examples): add mischievous-user strategy to redteam multi-turn examples by @will-holley in #4837
  • docs(gemini): update experimental Gemini model IDs to stable versions by @mldangelo in #4894
  • docs(google): add examples for gemini URL context and code execution tools by @adelmuursepp in #4923
  • docs(guide): guide for evaluating CrewAI agents with Promptfoo by @Ayush7614 in #4861
  • docs(images): standardize CrewAI image filenames to kebab-case by @mldangelo in #4941
  • docs(integration): add n8n integration by @typpo in #4917
  • docs(litellm): fix example with modern model IDs and proper embedding config by @mldangelo in #4885
  • docs(mcp): add mcp testing guide by @typpo in #4846
  • docs(mcp): add mcp to sidebar by @typpo in #4852
  • docs(metrics): add similar to model graded metrics table by @mldangelo in #4830
  • docs(providers): update available databricks models by @mldangelo in #4887
  • docs(providers): update provider index with missing providers and latest 2025 model IDs by @mldangelo in #4888
  • docs(release): add monthly release notes by @mldangelo in #4358
  • docs(resources): add arsenal link by @typpo in #4878
  • docs(security): add soc2 badge by @typpo in #4877
  • docs(site): add OWASP top 10 tldr blog post by @ladyofcode in #4853
  • docs(site): expand June 2025 release notes with detailed feature documentation by @mldangelo in #4881
  • docs(site): improve Google AI and Vertex authentication documentation by @mldangelo in #4892
  • docs(site): improve NLP metric explanations and add SEO metadata by @mldangelo in #4890
  • docs(site): update python documentation for basePath config option by @mldangelo in #4819
  • docs(ui): better mobile wrap on homepage tabs by @typpo in #4884
  • docs(ui): colors by @typpo in #4875
  • docs(ui): contrast fixes by @typpo in #4901
  • docs(ui): fix button clickability issue on hero sections by @mldangelo in #4905
  • docs(ui): remove bouncing down arrow in mobile by @typpo in #4882
  • docs(ui): remove text shadow by @typpo in #4898

New Contributors

Full Changelog: 0.116.7...0.117.0

Don't miss a new promptfoo release

NewReleases is sending notifications on new releases.