github Arize-ai/phoenix v0.0.24

This release updates Phoenix's capabilities for cluster-based analysis - providing more metrics to help you assess the performance and data quality of your unstructured data.

✨ Cluster Performance Metrics

Clusters can now be analyzed for model performance degradation! Our new release includes accuracy_score as a model performance metric. Using accuracy as the base metric on the embedding projection allows you to drill into clusters that map to bad predictions quicker than ever before. Finding pockets of bad performance is as simple as picking the metric and sorting the clusters by worst performing. If you are using Phoenix to identify production data that should be re-labeled and fed back into your training pipeline, this is the feature for you.

cluster_performance.mp4

✨ Cluster Data Quality / Custom Metrics

Clusters can now be analyzed via ad-hoc metrics! You can now calculate the average of any numeric feature, tag, prediction, or actual sent into Phoenix. This means you can now find "low-quality" clusters via the heuristic of your choosing! Below is an example of how precision@k for document retrieval (from a vector store) is used to identify clusters of chatbot queries that are failing to provide a good answer. The neat thing about this feature is that you can use Phoenix to build your own EDA heuristic! Care about rouge score or LLM-assisted evaluations? You can now use these to analyze your embeddings and to discover anomalies by simply sorting your clusters! This feature gives you, the data scientist, a powerful tool to formulate bespoke heuristics for identifying clusters of low performance, quality, and/or drift. We hope you like it!

context_retrieval.mp4

What's Changed

Full Changelog: 0.0.23...v0.0.24

Don't miss a new phoenix release

NewReleases is sending notifications on new releases.