github Arize-ai/phoenix arize-phoenix-v13.0.0
arize-phoenix: v13.0.0

13.0.0 (2026-02-13)

⚠ BREAKING CHANGES

  • dataset evaluators

Features

Bug Fixes

  • Add model and developer role mappings (#10814) (a9d0b01)
  • Add require all toggle to contains evaluator (#11274) (a47d9d2)
  • add SecretString type for generative credentials (#10761) (d6d1036)
  • add typed InputMapping model for DatasetEvaluators (#11223) (eed8afc)
  • app: upgrade @apollo/client to 4.1.3 for multipart subscription cancellation (#11155) (8130270)
  • cleanup projects when deleting dataset evaluators (#11058) (a0cbd33)
  • cleanup UI styles (#10908) (bb83132)
  • clients made public (27f7c9f)
  • Configure correct gen_ai client in playground (5aab7f7)
  • dataset experiments action button same size as siblings (22ef3d0)
  • dataset_helpers: normalize legacy function_call into tool_calls (#11372) (c78b886)
  • Ensure built in evaluator details pages load (#11273) (b00ee3d)
  • Ensure empty string is not set when blurring combobox (#10859) (4454ffa)
  • Ensure evaluators can run directly after creation (#10885) (631aa34)
  • ensure fields of polymorphic evaluator orm types are eagerly loaded (#10068) (4703e7e)
  • Ensure model config dialog scrolls when screen height is constrained (#11426) (0f9e44b)
  • Ensure mustache templates handle unicode characters (#10577) (570f4d7)
  • Ensure preview evaluators have the right name and output coloring (307e66b)
  • Ensure preview evaluators have the right name and output coloring (#11355) (5d0d252)
  • eslint errors (c41d0a1)
  • eval in same order as runs (#11271) (4f1f859)
  • Evaluation error handling (#10823) (4bc5455)
  • evaluators: add index on dataset_id in dataset_evaluators table (#11011) (21fdd0b)
  • evaluators: add telemetry for built-in evaluators (#11025) (02944c3)
  • evaluators: add validation for llm evaluator prompts (#10193) (99353b2)
  • evaluators: clean up evaluators rebase (fe8130f)
  • evaluators: coerce string types (#10743) (0889352)
  • evaluators: display traces for evaluation errors (#11214) (acc974b)
  • evaluators: don't display dataset evaluator projects in projects table (#11133) (088150f)
  • evaluators: enhance telemetry for llm evaluators (#11002) (310a348)
  • evaluators: ensure unique display names (#10882) (ebf73ea)
  • evaluators: evaluator bug fixes (#11127) (088d730)
  • evaluators: fix parse span output (#11339) (d009f4d)
  • evaluators: make evaluator names snake case (#11250) (3437f33)
  • evaluators: persist choices (#10076) (919e436)
  • evaluators: return annotation name in output config resolver (#10152) (2fb1244)
  • evaluators: run llm evaluators (#10480) (7be661b)
  • evaluators: support llm evaluator prompts with multipart content (#11113) (91606fa)
  • evaluators: trace llm evaluators (#10872) (5933125)
  • evaluators: use model specified in prompt when running evaluators (#10700) (1f4c37d)
  • evaluators: wire up invocation parameters and tool choice when executing llm evaluators (#10726) (b6b0a79)
  • experiments: parse Responses API output in experiment run results (#11349) (a70dddb)
  • fix broken build due to circular dependencies (#11026) (f6cb2da)
  • fix evaluator config dialog layout (#10366) (fcc0364)
  • Fix import error on evaluator page (#10185) (3714e2b)
  • fix playground scrolling and selection (#11200) (5e29a74)
  • fix regex to match composite IDs (#10459) (abe9c62)
  • gemini: model deprecation changes (#11381) (cde9188)
  • handle duplicate built-ins by using dataset evaluator ids (#11131) (a0065f5)
  • improve clarity of Test section in evaluator create/edit modal (#11164) (61c2cc7)
  • make graphql prompt label type honest to db model (runtime error) (#10849) (7062cca)
  • metrics: adjust chart label positioning and margins across multiple time series components (#11435) (20d3236)
  • pass tool_calls through directly instead of JSON-serializing (#10818) (bcc30ef)
  • playground: align openaiApiType default between UI and request builder (#11371) (00cea81)
  • playground: avoid contextvars in streaming generator cleanup (#11362) (fc1f4e6)
  • playground: generate unique tool call IDs for Gemini stream (#11376) (2bd4fd9)
  • playground: remove shift when scrollbar appears in playground dataset examples table (#11341) (2278e0e)
  • playground: render tool calls in dataset examples table with PlaygroundToolCall (#11366) (841e380)
  • playground: restore reasoning model client for chat completions path (#11433) (87f6d8e)
  • playground: route Google provider to correct streaming client by model (#11378) (5a21c4f)
  • playwright test failures (#11384) (ddc0c20)
  • preserve dataset tables' tab and selection state (#10928) (fe3e104)
  • preserve literal mode in evaluator forms when editing (#10935) (c2dc5ec)
  • progress bars should be empty for zero annotations (#11138) (d7a706e)
  • Raise on missing label (#11398) (075300a)
  • release DB session lock during evaluator HTTP calls in mutations (#11431) (785f04c)
  • Remove json.dumps coercion on evaluator context dictionaries (#10869) (0dc6357)
  • remove the ability to create a global evaluator" (#10461) (705edb1)
  • respect template variables path for non-streaming mode (#11183) (4744dbc)
  • return undefined instead of empty braces for playground tool calls (#11170) (baba63e)
  • Show require_all on ContainsEvaluator details (#11406) (a99b653)
  • templates: simplify template formatters (#11410) (1f6df01)
  • use untemplated prompt view on evaluator details (#11308) (279bc9f)
  • whitespace parsing and explanation for ContainsEvaluator (#11387) (d649308)

Performance Improvements

Documentation

Don't miss a new phoenix release

NewReleases is sending notifications on new releases.