Arize-ai/phoenix 0.0.30 on GitHub

Search And Retrieval Troubleshooting

This release contains troubleshooting workflows for search and retrieval! If you are building an LLM powered application that uses RAG (retrieval augmented generation), poor retrieval can be detrimental to the user-experience of your app. Phoenix now supports passing in your knowledge base as a corpus dataset so that you can inspect how your retrieval system is querying for relevant documents in your vector store. Phoenix automatically computes the distance between your queries and document embeddings, helping you quickly identify slices of your data that represent user queries that are not contained in your vector store. Not only that, it visually overlays the retrieval connections within the point cloud so you can visually highlight the vector store clusters your retriever is pulling data from. For all the details, check out our notebooks that cover search and retrieval!  

phoenix_rag.mp4

What's Changed

chore: docs to main (#882) by @mikeldking in #883
feat: add corpus points to umap by @RogerHYang in #917
ci: bump hdbscan to deal with cython builds by @mikeldking in #936
feat(embeddings): relationships lines by @mikeldking in #909
fix: Only initialize the corpus model if it exists by @mikeldking in #946
fix: pin HDBSCAN to cython3 compatible version by @mikeldking in #944
fix(fixtures): fixture reference schema incorrectly falling back to p… by @mikeldking in #949
fix: restore truthiness by @RogerHYang in #952
feat: display corpus points as octahedrons by @mikeldking in #954
feat: primary to corpus ratio by @RogerHYang in #957
fix: add document text to event (if the event is a retrieved document record) by @RogerHYang in #961
feat: allow string only responses (i.e. without embedding) by @RogerHYang in #956
feat: show retrieval in the slide-over by @mikeldking in #955
feat(embeddings): show corpus percent query by @mikeldking in #968
fix: wrong ratio calculation for % query by @mikeldking in #973
feat: calculate euclidean distance retrieval metric time series against the corpus dataset by @RogerHYang in #972
feat(embeddings): show retrieval distance timeseries by @mikeldking in #974
fix: suppress numpy runtime warnings about empty inputs by @RogerHYang in #975
fix(embeddings): metric selector fix for retrieval metrics by @mikeldking in #977
feat: allow iso 8601 timestamps by @axiomofjoy in #962
feat(datasets): make corpus schema declaration more semantic by @mikeldking in #978
docs: docs sync to main, Jul 25, 2023 by @mikeldking in #979
feat: implement phoenix.Dataset.from_open_inference class method by @axiomofjoy in #965
docs: adjust notebook text for LLM analysis using GPT by @amank94 in #994
docs: Update langchain_pinecone_search_and_retrieval_tutorial.ipynb by @arizedatngo in #991
docs: LlamaIndex tutorial enhancements by @axiomofjoy in #971

New Contributors

@amank94 made their first contribution in #994
@arizedatngo made their first contribution in #991

Full Changelog: v0.0.28...0.0.30