Added
- Embedders in the LLM xpack now have method
get_embedding_dimension
that returns number of dimension used by the chosen embedder. pathway.stdlib.indexing.nearest_neighbors
, with implementations ofpathway.stdlib.indexing.data_index.InnerIndex
based on k-NN via LSH (implemented in Pathway), and k-NN provided by USearch library.pathway.stdlib.indexing.vector_document_index
, with a few predefined instances ofpathway.stdlib.indexing.data_index.DataIndex
.pathway.stdlib.indexing.bm25
, with implementations ofpathway.stdlib.indexing.data_index.InnerIndex
based on BM25 index provided by Tantivy.pathway.stdlib.indexing.full_text_document_index
, with a predefined instance ofpathway.stdlib.indexing.data_index.DataIndex
.- Introduced the
reranker
module underllm.xpacks
. Includes few re-ranking strategies and utility functions for RAG applications.
Changed
- BREAKING:
windowby
generates IDs of produced rows differently than in the previous version. - BREAKING:
pw.io.csv.write
prints printable non-ascii characters as regular text, not\u{xxxx}
. - BREAKING: Connector methods
pw.io.elasticsearch.read
,pw.io.debezium.read
,pw.io.fs.read
,pw.io.jsonlines.read
,pw.io.kafka.read
,pw.io.python.read
,pw.io.redpanda.read
,pw.io.s3.read
now check the type of the input data. Previously it was not checked if the provided format was"json"
/"jsonlines"
. If the data is inconsistent with the provided schema, the row is skipped and the error message is emitted. - BREAKING:
query
andquery_as_of_now
methods ofpathway.stdlib.indexing.data_index.DataIndex
now returnpathway.JoinResult
, to allow resolving column name conflicts (between columns in the table with queries and table with index data). - BREAKING: DataIndex methods
query
andquery_as_of_now
now return score in a column named_pw_index_reply_score
(defined as_SCORE
variable inpathway.stdlib.indexing.colnames.py
).
Removed
- BREAKING:
pathway.stdlib.indexing.data_index.VectorDocumentIndex
class, some predefined instances are now meant to be obtained via methods provided inpathway.stdlib.indexing.vector_document_index
. - BREAKING:
with_distances
parameter ofquery
andquery_as_of_now
methods inpathway.stdlib.indexing.data_index.DataIndex
. Instead of 'distance', we now operate with a more general term 'score' (higher = better). For distance based indices score is usually defined as negative distance. Score is now always included in the answer, as long as underlying index returns something that indicates quality of a match.