This release updates Phoenix's capabilities for cluster-based analysis - providing more metrics to help you assess the performance and data quality of your unstructured data.
✨ Cluster Performance Metrics
Clusters can now be analyzed for model performance degradation! Our new release includes accuracy_score as a model performance metric. Using accuracy as the base metric on the embedding projection allows you to drill into clusters that map to bad predictions quicker than ever before. Finding pockets of bad performance is as simple as picking the metric and sorting the clusters by worst performing. If you are using Phoenix to identify production data that should be re-labeled and fed back into your training pipeline, this is the feature for you.
cluster_performance.mp4
✨ Cluster Data Quality / Custom Metrics
Clusters can now be analyzed via ad-hoc metrics! You can now calculate the average of any numeric feature, tag, prediction, or actual sent into Phoenix. This means you can now find "low-quality" clusters via the heuristic of your choosing! Below is an example of how precision@k for document retrieval (from a vector store) is used to identify clusters of chatbot queries that are failing to provide a good answer. The neat thing about this feature is that you can use Phoenix to build your own EDA heuristic! Care about rouge score or LLM-assisted evaluations? You can now use these to analyze your embeddings and to discover anomalies by simply sorting your clusters! This feature gives you, the data scientist, a powerful tool to formulate bespoke heuristics for identifying clusters of low performance, quality, and/or drift. We hope you like it!
context_retrieval.mp4
What's Changed
- docs: dolly vs. pythia by @axiomofjoy in #818
- feat: data quality metric by cluster by @RogerHYang in #804
- feat(dimensions): Add the ability to filter by data_type by @mikeldking in #822
- feat(embeddings): metric selector by @mikeldking in #821
- fix: nan bug for gql by @RogerHYang in #832
- feat: add stand-alone clusters endpoint for GraphQL query by @RogerHYang in #831
- feat(embeddings): cluster sorting by @mikeldking in #830
- chore: make placeholder text more obvious by @mikeldking in #833
- fix: change float16 to float32 as dtype for the nan series by @RogerHYang in #837
- fix: return nan on NotImplementedError (when binning on np.float16) by @RogerHYang in #838
- docs: sync 06-09-2023 by @mikeldking in #840
- feat(gql): add prediction id to event metadata by @RogerHYang in #843
- fix: coerce lists to arrays by @RogerHYang in #845
- feat: add performance metrics to each cluster by @RogerHYang in #828
- feat: accuracy timeseries by @RogerHYang in #842
- feat(embeddings): cluster data quality metrics by @mikeldking in #846
- docs: Update DEVELOPMENT.md with pypi publish changes. by @mikeldking in #849
- fix(embeddings): always place clusters with empty metrics at the bottom by @mikeldking in #850
- fix: show not found error when server is no longer running by @mikeldking in #853
- fix: guess whether a column contains any vector or all scalars by @RogerHYang in #854
- chore: camel-case metrics by @mikeldking in #856
- fix: skip empty interval bin with infinity endpoints (when all data are missing values) by @RogerHYang in #857
- feat(embeddings): cluster performance metrics by @mikeldking in #855
- fix(embeddings): force re-render clusters when opacity changes by @mikeldking in #858
- feat: show prediction id in selection details by @RogerHYang in #860
- fix: hide data quality metrics if empty by @mikeldking in #861
- fix: use random init when spectral init (the default) cannot be used by @RogerHYang in #862
- fix: replace NaT (Not a Time) with now (when dataset is empty) by @RogerHYang in #863
- fix(ui): cleanup event details for llm use-case by @mikeldking in #865
Full Changelog: 0.0.23...v0.0.24