Arize-ai/phoenix arize-phoenix-v13.0.0 on GitHub

13.0.0 (2026-02-13)

⚠ BREAKING CHANGES

dataset evaluators

Features

Add autocomplete to LLM eval prompt editor (#11275) (2dce02c)
add available tools to experiment output (#10857) (f17e359)
add built-in LLM evaluator configs to graphql (#10609) (4c8a70f)
add custom providers to model menu (#10766) (41fe933)
Add dataset deep link after selection from playground (#10886) (06c9687)
add default output config for llm evaluators (#10775) (ac3a54f)
add descriptions to built-ins (ff4db05)
add eval outputs to playground (#10263) (f32428c)
add evaluator count to tab (#10665) (3df4dc3)
add evaluator kind token (59fce2d)
add evaluator label to all evaluator prompts (#10781) (79d2afc)
Add evaluator preview mutation (#10651) (f5fe556)
add EvaluatorKindToken (ba74e43)
Add evaluators table to dataset evaluators page (#10157) (a98fcbd)
Add examples route with examples table (#10123) (57a3e99)
add explanation toggle to evaluator form (#10550) (609094d)
Add input mapping support to built-in evaluators (#10355) (77bcd4d)
Add json parse toggle to json distance builtin evaluator (#11321) (fda8554)
add metadata to evaluator db table (#10139) (b2fde4a)
add model search to the model menu (#10737) (f6bd4fe)
add model to evaluator tables (#11436) (0022722)
Add more builtin evaluator forms and improve flattening utility (#10834) (7b44ce0)
Add more builtin evaluators (#10826) (6e9adbc)
add OpenAI API type (Chat Completions vs Responses API) support (#11336) (1de0357)
Add optional description field to new evaluator creation (#10132) (bd72866)
Add output config display to built in evaluators (#11054) (5ffe5ab)
Add pre-built LLM evaluators to the evaluator creation menu (#10642) (c9d46e9)
add prompt info to evals table (#10486) (c2f5f54)
Add SwitchableEvaluatorInput to enable customizable eval inputs (#10835) (c1420ec)
add the ability to create examples in a chain (#10979) (77ed915)
add tool response handling evaluator template (#11276) (0738cb9)
add user id on the evaluators (#11016) (b187e41)
append messages param in the playground (#10800) (8bf2d74)
Builtin Evaluator Config Overrides (#10977) (3569107)
Builtin evaluator table (#11094) (dee0a7c)
bump openinference vercel (#11392) (8be1940)
cleanup the preview UI to show a full annotaiton value (#10776) (804de70)
clear table state when dataset or splits change (#11060) (5d06822)
Collect all json path segments when flattening example keys (#10075) (cd1ed2a)
composite field for model + params (#10773) (618eef7)
configurable vite port for running simultaneous instances (#11205) (ac964ed)
consolidate dataset creation flows" (#11375) (d55efc9)
convert boto3 to aioboto3 for async Bedrock client (#10803) (b6b4047)
Create distinct slideovers for evaluator use cases (#10303) (34bc75e)
Create evaluator mutations with optional dataset_id (#10065) (daf76c4)
Create zustand store for evaluator configuration (#10635) (6c9e389)
Custom evaluator names (#10451) (d93365d)
data model for custom providers (#10319) (2579658)
dataset evaluators (1274b03)
dataset table to have links to evaluators, playground. (#10934) (52bfe99)
dataset_helpers: deserialize JSON in message function/tool call arguments (#11299) (9af321f)
db: add session id index for spans across sqlite and postgres (#11028) (34fbd58)
delete button for custom providers (#10543) (22ccaff)
delete dataset evaluator (#11033) (e945ef6)
Disambiguate evaluator names (#10923) (7bbb97a)
document annotation gql api (#11023) (cdd8066)
edit button for custom providers (#10597) (5eb74ff)
edit provider secrets (#10502) (78ec34f)
enable creating evaluator with existing prompt (#10596) (195bb87)
enable deleting dataset evaluator (#10354) (822e87b)
enable editing description for builtin evaluators (#11238) (bc1e6ed)
enable full mustache prompt templates in the server (#11229) (68ba0ed)
enable inline annotation config creation from playground / span slideover (#10947) (f6ca4c2)
enable tagging prompt from playground (#10982) (e067d35)
Enhance evaluator config dialog for built in evaluators (#10482) (c39fbc8)
evaluator details enhancements (#11225) (892c378)
evaluator details view (#10734) (7cca42c)
evaluator form validation (#11334) (6b2b5d8)
evaluator prompt tagging UI improvements (#10523) (30c07b0)
evaluator styling updates (#11201) (90e6405)
evaluator traces UI (#11066) (fcd1e53)
Evaluators creation page (#10054) (b31e429)
evaluators table empty state (#11108) (824f439)
evaluators: add annotation name to eval menu (#10156) (081e54e)
evaluators: add evaluator select (#10063) (21a09e1)
evaluators: assign evaluator to dataset UI (#10135) (533edef)
evaluators: db migration for evaluator tables (#9960) (84769f5)
evaluators: evaluators update and delete mutations (#10128) (d971ad8)
evaluators: load in a default template for the evaluator that is useful (#10187) (2cb67f5)
evaluators: mutations for playground evaluator selector (#10042) (945d05a)
Export DiagLogLevel from phoenix-otel (#11402) (7456462)
expose dataset metadata to playground, evaluators, and mapping editor (#11264) (b572e8e)
Extend EvaluatorForm to render more kinds of evaluator (#10557) (0434ac5)
filter built-in classification evaluators by label (#11416) (aa95a3f)
Global evaluators table empty state (#11390) (d64e91b)
gql: check secrets for credentialsSet field (#10503) (33ba072)
graceful input shaping for evaluator name field (#11228) (284c5c5)
Implement builtin evaluators (#10308) (cbf2cf1)
Implement template variable autocomplete feature in Playground (#11184) (d0ab3e7)
Improve rendering of dataset evals on playground (#10136) (8d9e695)
improved prompt picker UI (#10900) (a9318fa)
include builtin evaluators in global evaluator table (#11267) (2b418cb)
is_latest flag on a version (#10464) (c97956a)
make dataset split action be a part of the menu (#10961) (8b11e89)
make the examples table resizable (#10975) (9b69375)
Migrate AzureOpenAI to v1 API (#10755) (ef55c4a)
migrate LDAP users to dedicated auth_method (#10993) (8204812)
model menu (abf97f5)
model menu (provider / model selection) (#10717) (abf97f5)
move example details diaload loading to be inside the slideover (#11040) (48ada4d)
move playground prompt tagging to prompt save modal (#11388) (2152e60)
Multi-output evaluator support (#11259) (cfa553c)
Parse and display optimization direction from evaluator on playground experiment runs (#10798) (410f0c5)
persist append messages setting in local storage (#10973) (3f96151)
persist tools with eval (#10220) (817346e)
Playground cancellation (#11055) (1405152)
playground eval select updates (#10163) (cac23a7)
playground: add OpenAI Responses API tool definition schema (#11422) (e2bd8eb)
preserve prompt id and version / tag info in the URL on playground (#11327) (1a0805d)
progressive loading of evaluator annotations (#10879) (a795ee5)
prompt template apply query (#10455) (2a33251)
prompt template preview (#10453) (2dccbe5)
Refactor DatasetEvaluator node (#10511) (4025bf1)
Refactor evaluator form for usage in create and edit workflows (#10253) (1743d3d)
Refactor evaluator input mapping in evaluator form (#10884) (f23cf06)
regex builtin validation (#11307) (3f855a0)
Rename "expected" to "reference" in Evaluators (#10860) (74c14ed)
Render optimization direction on experiment runs (#11067) (9ce244d)
Reorganize new evaluator form (#10081) (e20f32e)
replaces TracesTable with SpansTable on evaluator details page (#11347) (0dfa95b)
Shift select rows on dataset examples table (#10951) (b628a77)
show cancelled state in the playground (#10959) (8c2268c)
show code evaluators in the playground loading state (#10933) (5370629)
show experiment cost / latency at the top of the experiment columns (#10802) (3bec487)
show experiment evaluation summary at the end (8ff7832)
show experiment summary in the header of experiment details / compare (1411394)
Show experiment user (#10910) (7c6f850)
show latest instead of version (#10468) (0793aa1)
show markdown in experiment view (#10854) (9e22612)
show used in dataset column on evaluators (#11050) (2065272)
Support custom provider binding for LLM evaluators (#10971) (ef7622b)
template format selector on LLM evaluator form (#10662) (4b4d82f)
template variable path support for reference / metadata usage in prompt template variables (#10940) (bdc5128)
test button for custom providers (#10544) (070de7f)
track the progress of an experiment on the playground (#10774) (b2341e9)
UI for custom provider creation (#10431) (0e4a6cd)
ui: add evaluator icons and update menu (#10873) (490ba2a)
update evaluator prompt tagging behavior (#10694) (7b10d60)
update playground eval select (#10703) (b5d8ad3)
update playground evaluator select menu (#10450) (8297f21)
Use display name for playground annotation names (#10929) (862fa61)
wire up builtin evaluators (#10574) (c48440c)
wire up llm evaluators to playground (#10575) (b55d0e5)

Bug Fixes

Add model and developer role mappings (#10814) (a9d0b01)
Add require all toggle to contains evaluator (#11274) (a47d9d2)
add SecretString type for generative credentials (#10761) (d6d1036)
add typed InputMapping model for DatasetEvaluators (#11223) (eed8afc)
app: upgrade @apollo/client to 4.1.3 for multipart subscription cancellation (#11155) (8130270)
cleanup projects when deleting dataset evaluators (#11058) (a0cbd33)
cleanup UI styles (#10908) (bb83132)
clients made public (27f7c9f)
Configure correct gen_ai client in playground (5aab7f7)
dataset experiments action button same size as siblings (22ef3d0)
dataset_helpers: normalize legacy function_call into tool_calls (#11372) (c78b886)
Ensure built in evaluator details pages load (#11273) (b00ee3d)
Ensure empty string is not set when blurring combobox (#10859) (4454ffa)
Ensure evaluators can run directly after creation (#10885) (631aa34)
ensure fields of polymorphic evaluator orm types are eagerly loaded (#10068) (4703e7e)
Ensure model config dialog scrolls when screen height is constrained (#11426) (0f9e44b)
Ensure mustache templates handle unicode characters (#10577) (570f4d7)
Ensure preview evaluators have the right name and output coloring (307e66b)
Ensure preview evaluators have the right name and output coloring (#11355) (5d0d252)
eslint errors (c41d0a1)
eval in same order as runs (#11271) (4f1f859)
Evaluation error handling (#10823) (4bc5455)
evaluators: add index on dataset_id in dataset_evaluators table (#11011) (21fdd0b)
evaluators: add telemetry for built-in evaluators (#11025) (02944c3)
evaluators: add validation for llm evaluator prompts (#10193) (99353b2)
evaluators: clean up evaluators rebase (fe8130f)
evaluators: coerce string types (#10743) (0889352)
evaluators: display traces for evaluation errors (#11214) (acc974b)
evaluators: don't display dataset evaluator projects in projects table (#11133) (088150f)
evaluators: enhance telemetry for llm evaluators (#11002) (310a348)
evaluators: ensure unique display names (#10882) (ebf73ea)
evaluators: evaluator bug fixes (#11127) (088d730)
evaluators: fix parse span output (#11339) (d009f4d)
evaluators: make evaluator names snake case (#11250) (3437f33)
evaluators: persist choices (#10076) (919e436)
evaluators: return annotation name in output config resolver (#10152) (2fb1244)
evaluators: run llm evaluators (#10480) (7be661b)
evaluators: support llm evaluator prompts with multipart content (#11113) (91606fa)
evaluators: trace llm evaluators (#10872) (5933125)
evaluators: use model specified in prompt when running evaluators (#10700) (1f4c37d)
evaluators: wire up invocation parameters and tool choice when executing llm evaluators (#10726) (b6b0a79)
experiments: parse Responses API output in experiment run results (#11349) (a70dddb)
fix broken build due to circular dependencies (#11026) (f6cb2da)
fix evaluator config dialog layout (#10366) (fcc0364)
Fix import error on evaluator page (#10185) (3714e2b)
fix playground scrolling and selection (#11200) (5e29a74)
fix regex to match composite IDs (#10459) (abe9c62)
gemini: model deprecation changes (#11381) (cde9188)
handle duplicate built-ins by using dataset evaluator ids (#11131) (a0065f5)
improve clarity of Test section in evaluator create/edit modal (#11164) (61c2cc7)
make graphql prompt label type honest to db model (runtime error) (#10849) (7062cca)
metrics: adjust chart label positioning and margins across multiple time series components (#11435) (20d3236)
pass tool_calls through directly instead of JSON-serializing (#10818) (bcc30ef)
playground: align openaiApiType default between UI and request builder (#11371) (00cea81)
playground: avoid contextvars in streaming generator cleanup (#11362) (fc1f4e6)
playground: generate unique tool call IDs for Gemini stream (#11376) (2bd4fd9)
playground: remove shift when scrollbar appears in playground dataset examples table (#11341) (2278e0e)
playground: render tool calls in dataset examples table with PlaygroundToolCall (#11366) (841e380)
playground: restore reasoning model client for chat completions path (#11433) (87f6d8e)
playground: route Google provider to correct streaming client by model (#11378) (5a21c4f)
playwright test failures (#11384) (ddc0c20)
preserve dataset tables' tab and selection state (#10928) (fe3e104)
preserve literal mode in evaluator forms when editing (#10935) (c2dc5ec)
progress bars should be empty for zero annotations (#11138) (d7a706e)
Raise on missing label (#11398) (075300a)
release DB session lock during evaluator HTTP calls in mutations (#11431) (785f04c)
Remove json.dumps coercion on evaluator context dictionaries (#10869) (0dc6357)
remove the ability to create a global evaluator" (#10461) (705edb1)
respect template variables path for non-streaming mode (#11183) (4744dbc)
return undefined instead of empty braces for playground tool calls (#11170) (baba63e)
Show require_all on ContainsEvaluator details (#11406) (a99b653)
templates: simplify template formatters (#11410) (1f6df01)
use untemplated prompt view on evaluator details (#11308) (279bc9f)
whitespace parsing and explanation for ContainsEvaluator (#11387) (d649308)

Performance Improvements

add data loaders for evaluators table (#11298) (97a67ef)

Documentation

dataset evaluators (86ffb16)
mintlify skills (#11417) (e3d7905)
skills update for exit processing (#11415) (40b04eb)

Arize-ai/phoenix arize-phoenix-v13.0.0 arize-phoenix: v13.0.0 on GitHub

13.0.0 (2026-02-13)

⚠ BREAKING CHANGES

Features

Bug Fixes

Performance Improvements

Documentation

Arize-ai/phoenix arize-phoenix-v13.0.0
arize-phoenix: v13.0.0

on GitHub