github Arize-ai/phoenix arize-phoenix-v16.0.0
arize-phoenix: v16.0.0

6 hours ago

16.0.0 (2026-05-21)

MIGRATION.md

⚠ BREAKING CHANGES

  • Sandboxing and Code Evaluators (#13290)

Features

Phoenix now lets you compose evaluation strategies in code.

Most eval tooling hands you a fixed menu of judge templates. Real evaluation is rarely that tidy.

Code Evaluators enable you to build evaluation criteria the way you want. You write a Python or TypeScript evaluate() function in the Phoenix UI — no SDK, no local runtime, no deploy step — and Phoenix runs it server-side, recording labels and scores as annotations on every experiment run.

Because it's just code, you control the whole strategy:

• Composite scoring: blend sub-scores (LLM judgment + deterministic rules) into one weighted metric
• Embedding-based evaluation: cosine similarity over embeddings instead of brittle string matching
• LLM juries: poll multiple models and combine verdicts into a weighted consensus

Sandboxed Code evaluators unlock the idea of agents as a judge as well. We're excited where this is heading.

  • agents: Enable provider native web search / fetch when available (#13333) (41eb4fc)
  • Sandboxing and Code Evaluators (#13290) (e294d93)

Bug Fixes

Don't miss a new phoenix release

NewReleases is sending notifications on new releases.