Search And Retrieval Troubleshooting
This release contains troubleshooting workflows for search and retrieval! If you are building an LLM powered application that uses RAG (retrieval augmented generation), poor retrieval can be detrimental to the user-experience of your app. Phoenix now supports passing in your knowledge base as a corpus dataset so that you can inspect how your retrieval system is querying for relevant documents in your vector store. Phoenix automatically computes the distance between your queries and document embeddings, helping you quickly identify slices of your data that represent user queries that are not contained in your vector store. Not only that, it visually overlays the retrieval connections within the point cloud so you can visually highlight the vector store clusters your retriever is pulling data from. For all the details, check out our notebooks that cover search and retrieval!
phoenix_rag.mp4
What's Changed
- chore: docs to main (#882) by @mikeldking in #883
- feat: add corpus points to umap by @RogerHYang in #917
- ci: bump hdbscan to deal with cython builds by @mikeldking in #936
- feat(embeddings): relationships lines by @mikeldking in #909
- fix: Only initialize the corpus model if it exists by @mikeldking in #946
- fix: pin HDBSCAN to cython3 compatible version by @mikeldking in #944
- fix(fixtures): fixture reference schema incorrectly falling back to p… by @mikeldking in #949
- fix: restore truthiness by @RogerHYang in #952
- feat: display corpus points as octahedrons by @mikeldking in #954
- feat: primary to corpus ratio by @RogerHYang in #957
- fix: add document text to event (if the event is a retrieved document record) by @RogerHYang in #961
- feat: allow string only responses (i.e. without embedding) by @RogerHYang in #956
- feat: show retrieval in the slide-over by @mikeldking in #955
- feat(embeddings): show corpus percent query by @mikeldking in #968
- fix: wrong ratio calculation for % query by @mikeldking in #973
- feat: calculate euclidean distance retrieval metric time series against the corpus dataset by @RogerHYang in #972
- feat(embeddings): show retrieval distance timeseries by @mikeldking in #974
- fix: suppress numpy runtime warnings about empty inputs by @RogerHYang in #975
- fix(embeddings): metric selector fix for retrieval metrics by @mikeldking in #977
- feat: allow iso 8601 timestamps by @axiomofjoy in #962
- feat(datasets): make corpus schema declaration more semantic by @mikeldking in #978
- docs: docs sync to main, Jul 25, 2023 by @mikeldking in #979
- feat: implement phoenix.Dataset.from_open_inference class method by @axiomofjoy in #965
- docs: adjust notebook text for LLM analysis using GPT by @amank94 in #994
- docs: Update langchain_pinecone_search_and_retrieval_tutorial.ipynb by @arizedatngo in #991
- docs: LlamaIndex tutorial enhancements by @axiomofjoy in #971
New Contributors
- @amank94 made their first contribution in #994
- @arizedatngo made their first contribution in #991
Full Changelog: v0.0.28...0.0.30