13.0.0 (2026-02-13)
⚠ BREAKING CHANGES
- dataset evaluators
Features
- Add autocomplete to LLM eval prompt editor (#11275) (2dce02c)
- add available tools to experiment output (#10857) (f17e359)
- add built-in LLM evaluator configs to graphql (#10609) (4c8a70f)
- add custom providers to model menu (#10766) (41fe933)
- Add dataset deep link after selection from playground (#10886) (06c9687)
- add default output config for llm evaluators (#10775) (ac3a54f)
- add descriptions to built-ins (ff4db05)
- add eval outputs to playground (#10263) (f32428c)
- add evaluator count to tab (#10665) (3df4dc3)
- add evaluator kind token (59fce2d)
- add evaluator label to all evaluator prompts (#10781) (79d2afc)
- Add evaluator preview mutation (#10651) (f5fe556)
- add EvaluatorKindToken (ba74e43)
- Add evaluators table to dataset evaluators page (#10157) (a98fcbd)
- Add examples route with examples table (#10123) (57a3e99)
- add explanation toggle to evaluator form (#10550) (609094d)
- Add input mapping support to built-in evaluators (#10355) (77bcd4d)
- Add json parse toggle to json distance builtin evaluator (#11321) (fda8554)
- add metadata to evaluator db table (#10139) (b2fde4a)
- add model search to the model menu (#10737) (f6bd4fe)
- add model to evaluator tables (#11436) (0022722)
- Add more builtin evaluator forms and improve flattening utility (#10834) (7b44ce0)
- Add more builtin evaluators (#10826) (6e9adbc)
- add OpenAI API type (Chat Completions vs Responses API) support (#11336) (1de0357)
- Add optional description field to new evaluator creation (#10132) (bd72866)
- Add output config display to built in evaluators (#11054) (5ffe5ab)
- Add pre-built LLM evaluators to the evaluator creation menu (#10642) (c9d46e9)
- add prompt info to evals table (#10486) (c2f5f54)
- Add SwitchableEvaluatorInput to enable customizable eval inputs (#10835) (c1420ec)
- add the ability to create examples in a chain (#10979) (77ed915)
- add tool response handling evaluator template (#11276) (0738cb9)
- add user id on the evaluators (#11016) (b187e41)
- append messages param in the playground (#10800) (8bf2d74)
- Builtin Evaluator Config Overrides (#10977) (3569107)
- Builtin evaluator table (#11094) (dee0a7c)
- bump openinference vercel (#11392) (8be1940)
- cleanup the preview UI to show a full annotaiton value (#10776) (804de70)
- clear table state when dataset or splits change (#11060) (5d06822)
- Collect all json path segments when flattening example keys (#10075) (cd1ed2a)
- composite field for model + params (#10773) (618eef7)
- configurable vite port for running simultaneous instances (#11205) (ac964ed)
- consolidate dataset creation flows" (#11375) (d55efc9)
- convert boto3 to aioboto3 for async Bedrock client (#10803) (b6b4047)
- Create distinct slideovers for evaluator use cases (#10303) (34bc75e)
- Create evaluator mutations with optional dataset_id (#10065) (daf76c4)
- Create zustand store for evaluator configuration (#10635) (6c9e389)
- Custom evaluator names (#10451) (d93365d)
- data model for custom providers (#10319) (2579658)
- dataset evaluators (1274b03)
- dataset table to have links to evaluators, playground. (#10934) (52bfe99)
- dataset_helpers: deserialize JSON in message function/tool call arguments (#11299) (9af321f)
- db: add session id index for spans across sqlite and postgres (#11028) (34fbd58)
- delete button for custom providers (#10543) (22ccaff)
- delete dataset evaluator (#11033) (e945ef6)
- Disambiguate evaluator names (#10923) (7bbb97a)
- document annotation gql api (#11023) (cdd8066)
- edit button for custom providers (#10597) (5eb74ff)
- edit provider secrets (#10502) (78ec34f)
- enable creating evaluator with existing prompt (#10596) (195bb87)
- enable deleting dataset evaluator (#10354) (822e87b)
- enable editing description for builtin evaluators (#11238) (bc1e6ed)
- enable full mustache prompt templates in the server (#11229) (68ba0ed)
- enable inline annotation config creation from playground / span slideover (#10947) (f6ca4c2)
- enable tagging prompt from playground (#10982) (e067d35)
- Enhance evaluator config dialog for built in evaluators (#10482) (c39fbc8)
- evaluator details enhancements (#11225) (892c378)
- evaluator details view (#10734) (7cca42c)
- evaluator form validation (#11334) (6b2b5d8)
- evaluator prompt tagging UI improvements (#10523) (30c07b0)
- evaluator styling updates (#11201) (90e6405)
- evaluator traces UI (#11066) (fcd1e53)
- Evaluators creation page (#10054) (b31e429)
- evaluators table empty state (#11108) (824f439)
- evaluators: add annotation name to eval menu (#10156) (081e54e)
- evaluators: add evaluator select (#10063) (21a09e1)
- evaluators: assign evaluator to dataset UI (#10135) (533edef)
- evaluators: db migration for evaluator tables (#9960) (84769f5)
- evaluators: evaluators update and delete mutations (#10128) (d971ad8)
- evaluators: load in a default template for the evaluator that is useful (#10187) (2cb67f5)
- evaluators: mutations for playground evaluator selector (#10042) (945d05a)
- Export DiagLogLevel from phoenix-otel (#11402) (7456462)
- expose dataset metadata to playground, evaluators, and mapping editor (#11264) (b572e8e)
- Extend EvaluatorForm to render more kinds of evaluator (#10557) (0434ac5)
- filter built-in classification evaluators by label (#11416) (aa95a3f)
- Global evaluators table empty state (#11390) (d64e91b)
- gql: check secrets for
credentialsSetfield (#10503) (33ba072) - graceful input shaping for evaluator name field (#11228) (284c5c5)
- Implement builtin evaluators (#10308) (cbf2cf1)
- Implement template variable autocomplete feature in Playground (#11184) (d0ab3e7)
- Improve rendering of dataset evals on playground (#10136) (8d9e695)
- improved prompt picker UI (#10900) (a9318fa)
- include builtin evaluators in global evaluator table (#11267) (2b418cb)
- is_latest flag on a version (#10464) (c97956a)
- make dataset split action be a part of the menu (#10961) (8b11e89)
- make the examples table resizable (#10975) (9b69375)
- Migrate
AzureOpenAIto v1 API (#10755) (ef55c4a) - migrate LDAP users to dedicated auth_method (#10993) (8204812)
- model menu (abf97f5)
- model menu (provider / model selection) (#10717) (abf97f5)
- move example details diaload loading to be inside the slideover (#11040) (48ada4d)
- move playground prompt tagging to prompt save modal (#11388) (2152e60)
- Multi-output evaluator support (#11259) (cfa553c)
- Parse and display optimization direction from evaluator on playground experiment runs (#10798) (410f0c5)
- persist append messages setting in local storage (#10973) (3f96151)
- persist tools with eval (#10220) (817346e)
- Playground cancellation (#11055) (1405152)
- playground eval select updates (#10163) (cac23a7)
- playground: add OpenAI Responses API tool definition schema (#11422) (e2bd8eb)
- preserve prompt id and version / tag info in the URL on playground (#11327) (1a0805d)
- progressive loading of evaluator annotations (#10879) (a795ee5)
- prompt template apply query (#10455) (2a33251)
- prompt template preview (#10453) (2dccbe5)
- Refactor DatasetEvaluator node (#10511) (4025bf1)
- Refactor evaluator form for usage in create and edit workflows (#10253) (1743d3d)
- Refactor evaluator input mapping in evaluator form (#10884) (f23cf06)
- regex builtin validation (#11307) (3f855a0)
- Rename "expected" to "reference" in Evaluators (#10860) (74c14ed)
- Render optimization direction on experiment runs (#11067) (9ce244d)
- Reorganize new evaluator form (#10081) (e20f32e)
- replaces TracesTable with SpansTable on evaluator details page (#11347) (0dfa95b)
- Shift select rows on dataset examples table (#10951) (b628a77)
- show cancelled state in the playground (#10959) (8c2268c)
- show code evaluators in the playground loading state (#10933) (5370629)
- show experiment cost / latency at the top of the experiment columns (#10802) (3bec487)
- show experiment evaluation summary at the end (8ff7832)
- show experiment summary in the header of experiment details / compare (1411394)
- Show experiment user (#10910) (7c6f850)
- show latest instead of version (#10468) (0793aa1)
- show markdown in experiment view (#10854) (9e22612)
- show used in dataset column on evaluators (#11050) (2065272)
- Support custom provider binding for LLM evaluators (#10971) (ef7622b)
- template format selector on LLM evaluator form (#10662) (4b4d82f)
- template variable path support for reference / metadata usage in prompt template variables (#10940) (bdc5128)
- test button for custom providers (#10544) (070de7f)
- track the progress of an experiment on the playground (#10774) (b2341e9)
- UI for custom provider creation (#10431) (0e4a6cd)
- ui: add evaluator icons and update menu (#10873) (490ba2a)
- update evaluator prompt tagging behavior (#10694) (7b10d60)
- update playground eval select (#10703) (b5d8ad3)
- update playground evaluator select menu (#10450) (8297f21)
- Use display name for playground annotation names (#10929) (862fa61)
- wire up builtin evaluators (#10574) (c48440c)
- wire up llm evaluators to playground (#10575) (b55d0e5)
Bug Fixes
- Add model and developer role mappings (#10814) (a9d0b01)
- Add require all toggle to contains evaluator (#11274) (a47d9d2)
- add SecretString type for generative credentials (#10761) (d6d1036)
- add typed InputMapping model for DatasetEvaluators (#11223) (eed8afc)
- app: upgrade @apollo/client to 4.1.3 for multipart subscription cancellation (#11155) (8130270)
- cleanup projects when deleting dataset evaluators (#11058) (a0cbd33)
- cleanup UI styles (#10908) (bb83132)
- clients made public (27f7c9f)
- Configure correct gen_ai client in playground (5aab7f7)
- dataset experiments action button same size as siblings (22ef3d0)
- dataset_helpers: normalize legacy function_call into tool_calls (#11372) (c78b886)
- Ensure built in evaluator details pages load (#11273) (b00ee3d)
- Ensure empty string is not set when blurring combobox (#10859) (4454ffa)
- Ensure evaluators can run directly after creation (#10885) (631aa34)
- ensure fields of polymorphic evaluator orm types are eagerly loaded (#10068) (4703e7e)
- Ensure model config dialog scrolls when screen height is constrained (#11426) (0f9e44b)
- Ensure mustache templates handle unicode characters (#10577) (570f4d7)
- Ensure preview evaluators have the right name and output coloring (307e66b)
- Ensure preview evaluators have the right name and output coloring (#11355) (5d0d252)
- eslint errors (c41d0a1)
- eval in same order as runs (#11271) (4f1f859)
- Evaluation error handling (#10823) (4bc5455)
- evaluators: add index on dataset_id in dataset_evaluators table (#11011) (21fdd0b)
- evaluators: add telemetry for built-in evaluators (#11025) (02944c3)
- evaluators: add validation for llm evaluator prompts (#10193) (99353b2)
- evaluators: clean up evaluators rebase (fe8130f)
- evaluators: coerce string types (#10743) (0889352)
- evaluators: display traces for evaluation errors (#11214) (acc974b)
- evaluators: don't display dataset evaluator projects in projects table (#11133) (088150f)
- evaluators: enhance telemetry for llm evaluators (#11002) (310a348)
- evaluators: ensure unique display names (#10882) (ebf73ea)
- evaluators: evaluator bug fixes (#11127) (088d730)
- evaluators: fix parse span output (#11339) (d009f4d)
- evaluators: make evaluator names snake case (#11250) (3437f33)
- evaluators: persist choices (#10076) (919e436)
- evaluators: return annotation name in output config resolver (#10152) (2fb1244)
- evaluators: run llm evaluators (#10480) (7be661b)
- evaluators: support llm evaluator prompts with multipart content (#11113) (91606fa)
- evaluators: trace llm evaluators (#10872) (5933125)
- evaluators: use model specified in prompt when running evaluators (#10700) (1f4c37d)
- evaluators: wire up invocation parameters and tool choice when executing llm evaluators (#10726) (b6b0a79)
- experiments: parse Responses API output in experiment run results (#11349) (a70dddb)
- fix broken build due to circular dependencies (#11026) (f6cb2da)
- fix evaluator config dialog layout (#10366) (fcc0364)
- Fix import error on evaluator page (#10185) (3714e2b)
- fix playground scrolling and selection (#11200) (5e29a74)
- fix regex to match composite IDs (#10459) (abe9c62)
- gemini: model deprecation changes (#11381) (cde9188)
- handle duplicate built-ins by using dataset evaluator ids (#11131) (a0065f5)
- improve clarity of Test section in evaluator create/edit modal (#11164) (61c2cc7)
- make graphql prompt label type honest to db model (runtime error) (#10849) (7062cca)
- metrics: adjust chart label positioning and margins across multiple time series components (#11435) (20d3236)
- pass tool_calls through directly instead of JSON-serializing (#10818) (bcc30ef)
- playground: align openaiApiType default between UI and request builder (#11371) (00cea81)
- playground: avoid contextvars in streaming generator cleanup (#11362) (fc1f4e6)
- playground: generate unique tool call IDs for Gemini stream (#11376) (2bd4fd9)
- playground: remove shift when scrollbar appears in playground dataset examples table (#11341) (2278e0e)
- playground: render tool calls in dataset examples table with PlaygroundToolCall (#11366) (841e380)
- playground: restore reasoning model client for chat completions path (#11433) (87f6d8e)
- playground: route Google provider to correct streaming client by model (#11378) (5a21c4f)
- playwright test failures (#11384) (ddc0c20)
- preserve dataset tables' tab and selection state (#10928) (fe3e104)
- preserve literal mode in evaluator forms when editing (#10935) (c2dc5ec)
- progress bars should be empty for zero annotations (#11138) (d7a706e)
- Raise on missing label (#11398) (075300a)
- release DB session lock during evaluator HTTP calls in mutations (#11431) (785f04c)
- Remove json.dumps coercion on evaluator context dictionaries (#10869) (0dc6357)
- remove the ability to create a global evaluator" (#10461) (705edb1)
- respect template variables path for non-streaming mode (#11183) (4744dbc)
- return undefined instead of empty braces for playground tool calls (#11170) (baba63e)
- Show require_all on ContainsEvaluator details (#11406) (a99b653)
- templates: simplify template formatters (#11410) (1f6df01)
- use untemplated prompt view on evaluator details (#11308) (279bc9f)
- whitespace parsing and explanation for ContainsEvaluator (#11387) (d649308)