quickwit-oss/tantivy 0.20 on GitHub

What's Changed

Bugfixes

Fix phrase queries with slop (slop supports now transpositions, algorithm that carries slop so far for num terms > 2) #2031 #2020(@PSeitz)
Handle error for exists on MMapDirectory #1988 (@PSeitz)
Aggregation
- Fix min doc_count empty merge bug #2057 (@PSeitz)
- Fix: Sort order for term aggregations (sort order on key was inverted) #1858 (@PSeitz)

Features/Improvements

Add PhrasePrefixQuery #1842 (@trinity-1686a)
Add coerce option for text and numbers types (convert the value instead of returning an error during indexing) #1904 (@PSeitz)
Add regex tokenizer #1759(@mkleen)
Move tokenizer API to seperate crate. Having a seperate crate with a stable API will allow us to use tokenizers with different tantivy versions. #1767 (@PSeitz)
Columnar crate: New fast field handling (@fulmicoton @PSeitz) #1806#1809
- Support for fast fields with optional values. Previously tantivy supported only single-valued and multi-value fast fields. The encoding of optional fast fields is now very compact.
- Fast field Support for JSON (schemaless fast fields). Support multiple types on the same column. #1876 (@fulmicoton)
- Unified access for fast fields over different cardinalities.
- Unified storage for typed and untyped fields.
- Move fastfield codecs into columnar. #1782 (@fulmicoton)
- Sparse dense index for optional values #1716 (@PSeitz)
- Switch to nanosecond precision in DateTime fastfield #2016 (@PSeitz)
Aggregation
- Add date_histogram aggregation (only fixed_interval for now) #1900 (@PSeitz)
- Add percentiles aggregations #1984 (@PSeitz)
- [breaking] Drop JSON support on intermediate agg result (we use postcard as format in quickwit to send intermediate results) #1992 (@PSeitz)
- Set memory limit in bytes for aggregations after which they abort (Previously there was only the bucket limit) #1942 #1957(@PSeitz)
- Add support for u64,i64,f64 fields in term aggregation #1883 (@PSeitz)
- Add count, min, max, and sum aggregations #1794 (@guilload)
- Switch to Aggregation without serde_untagged => better deserialization errors. #2003 (@PSeitz)
- Switch to ms in histogram for date type (ES compatibility) #2045 (@PSeitz)
- Reduce term aggregation memory consumption #2013 (@PSeitz)
- Reduce agg memory consumption: Replace generic aggregation collector (which has a high memory requirement per instance) in aggregation tree with optimized versions behind a trait.
- Split term collection count and sub_agg (Faster term agg with less memory consumption for cases without sub-aggs) #1921 (@PSeitz)
- Schemaless aggregations: In combination with stacker tantivy supports now schemaless aggregations via the JSON type.
  - Add aggregation support for JSON type #1888 (@PSeitz)
  - Mixed types support on JSON fields in aggs #1971 (@PSeitz)
- Perf: Fetch blocks of vals in aggregation for all cardinality #1950 (@PSeitz)
Searcher with disabled scoring via EnableScoring::Disabled #1780 (@shikhar)
Enable tokenizer on json fields #2053 (@PSeitz)
Enforcing "NOT" and "-" queries consistency in UserInputAst #1609 (@denis Bazhenov)
Faster indexing
- Refactor tokenization pipeline to use GATs #1924 (@trinity-1686a)
- Faster term hash map #1940 (@PSeitz)
- Refactor vint #2010 (@PSeitz)
Faster search
- Work in batches of docs on the SegmentCollector (Only for cases without score for now) #1937 (@PSeitz)
- Faster fast field range queries using SIMD #1954 (@fulmicoton)
- Improve fast field range query performance #1864 (@PSeitz)
Make BM25 scoring more flexible #1855 (@alexcole)
Switch fs2 to fs4 as it is now unmaintained and does not support illumos #1944 (@Toasterson)
Made BooleanWeight and BoostWeight public #1991 (@fulmicoton)
Make index compatible with virtual drives on Windows #1843 (@yukun Guo)
Auto downgrade index record option, instead of vint error #1857 (@PSeitz)
Enable range query on fast field for u64 compatible types #1762 (@PSeitz) [#1876]
sstable
- Isolating sstable and stacker in independant crates. #1718 (@fulmicoton)
- New sstable format #1943 #1953 (@trinity-1686a)
- Use DeltaReader directly to implement Dictionnary::ord_to_term #1928 (@trinity-1686a)
- Use DeltaReader directly to implement Dictionnary::term_ord #1925 (@trinity-1686a)
Add seperate tokenizer manager for fast fields #2019 (@PSeitz)
Make construction of LevenshteinAutomatonBuilder for FuzzyTermQuery instances lazy. #1756 (@adamreichold)
Added support for madvise when opening an mmaped Index #2036 (@fulmicoton)
Rename DatePrecision to DateTimePrecision #2051 (@guilload)
Query Parser
- Quotation mark can now be used for phrase queries. #2050 (@fulmicoton)
- PhrasePrefixQuery is supported in the query parser via: field:"phrase ter"* #2044 (@adamreichold)
Docs
- Update examples for literate docs #1880 (@PSeitz)
- Add ip field example #1775 (@PSeitz)
- Fix doc store cache documentation #1821 (@PSeitz)
- Fix BooleanQuery document #1999 (@RT_Enzyme)
- Update comments in the faceted search example #1737 (@DawChihLiou)

New Contributors

@mhlakhani made their first contribution in #1733
@pinkforest made their first contribution in #1746
@DawChihLiou made their first contribution in #1737
@mkleen made their first contribution in #1759
@lonre made their first contribution in #1803
@gyk made their first contribution in #1843
@alexcole made their first contribution in #1855
@Toasterson made their first contribution in #1944
@vsop-479 made their first contribution in #1970
@Tony-X made their first contribution in #1985
@RTEnzyme made their first contribution in #1999
@tottoto made their first contribution in #2018
@nyurik made their first contribution in #2038
@bazhenov made their first contribution in #1609
@lavrd made their first contribution in #1422
@tnxbutno made their first contribution in #2069

Full Changelog: https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md

quickwit-oss/tantivy 0.20 Tantivy v0.20 on GitHub

What's Changed

Bugfixes

Features/Improvements

New Contributors

quickwit-oss/tantivy 0.20
Tantivy v0.20

on GitHub