Breaking Changes
none
New Features
Faster Filtering through Bitmap Indexing (RoaringSet
) for non-text props
This release introduces bitmap indexing for much faster filtering of any non-text property. In some extreme cases, the speed up is over 1,000x.
This change is non-breaking. Old datasets will continue to work using the old mechanism. There is also an option to perform a zero downtime migration, during which Weaviate will be in read-only mode.
- Native Roaring Bitmap support in LSM Store by @aliszka in #2515
- Migration of inverted index buckets of collection set to roaring set strategy by @aliszka in #2690
HNSW-PQ (Product Quantization for reduced memory footprint)
Optionally compress vectors using Product Quantization while still using HNSW. This combination brings the best of both worlds; depending on configuration the overall memory footprint can drop 25%, 50%, or even 75% percent, while still offering great performance and recall. Check this blogpost (coming soon) for details on the respective trade-offs.
BM25 WAND Algorithm & Performance Improvements
BM25 scoring (originally introduced in v1.17) was reworked to use a Weak-AND (WAND) algorithm. In addition, further performance improvements were applied. Now BM25 scoring is considerably faster than in v1.18. In some cases more than 10x faster.
- Add simple bm25 benchmark by @dirkkul in #2581
- Bm25 with WAND by @dirkkul in #2583
- Add Bm25 stopwords by @dirkkul in #2689
- Bm25: only add additonal explanations when requested by @dirkkul in #2721
- Concurrently run BM25 search terms by @dirkkul in #2706
BM25 & Hybrid Search where
filters
While v1.17 introduced both BM25 and Hybrid Search, one limitation was that where
filters could not be combined with either of those options. Now they can.
Replication: Tunable Consistency
V1.17 introduced Replication support, however, not all write and read endpoints supported tunable consistency yet. They do now.
- Set tunable consistency for replicated writes by @redouan-rhazouani in #2542
- Extend checking for object existence with tunable consistency by @redouan-rhazouani in #2559
- Introduce getting batch objects with tunable consistency by @redouan-rhazouani in #2568
- Extend http write endpoints with consistency level by @parkerduckworth in #2597
- Additional tunable consistency support for objects PUT, HEAD by @parkerduckworth in #2674
- Return whenever data has been replicated depending on the specified consistency level by @redouan-rhazouani in #2679
- Change default consistency level from ALL to QUORUM by @redouan-rhazouani in #2726
- Log replication errors when writing or reading objects by @redouan-rhazouani in #2725
Replication: Automatic Repair on Read
If Weaviate detects an inconsistency between two or more replicas, it will now automatically repair those inconsistencies if possible. No user action is required other than querying the "broken" object with a high enough consistency level.
- Repair on read remote index client implementation by @parkerduckworth in #2574
- Introduce objects digest to optimize replicated object reads by @parkerduckworth in #2603
- Define client interface used during read repair strategy by @redouan-rhazouani in #2630
- Detection of deleted objects during read repair by @parkerduckworth in #2623
- Implement read repair mechanism when reading single as well as multiple objects by @redouan-rhazouani in #2643
- Read repair integration tests by @parkerduckworth in #2645
- Implement read repair mechanism when checking existence of single objects by @redouan-rhazouani in #2661
- Concurrent read repairs depending on the consistency level by @redouan-rhazouani in #2672
- Improve conflict detection when a target object has been already updated to most recent state by @redouan-rhazouani in #2685
Export all objects using Cursor API
Previously users would run into a the QUERY_MAXIMUM_RESULTS
limit when trying to export all objects out of Weaviate. Alternatively they had to build their own paging workaround using where
filters. With the new Cursor API it is easy to extract all objects. The cost per "page" is constant regardless of scale.
- Scroll through every object in a class using an id cursor by @antas-marcin in #2615
New Module: Azure Backups Provider
In addition to GCS
and AWS S3
, Weaviate now supports Azure Cloud Storage as a backup destination.
- Create Azure backup module by @antas-marcin in #2711
Fixes
Security
- Bump github.com/opencontainers/runc from 1.0.2 to 1.1.2 by @dependabot in #2525
- Bump github.com/docker/distribution from 2.7.1+incompatible to 2.8.0+incompatible by @dependabot in #2523
- Bump github.com/containerd/containerd from 1.5.9 to 1.5.16 by @dependabot in #2524
- Dependabot fixes by @etiennedi in #2544
- Bump github.com/containerd/containerd from 1.5.16 to 1.5.18 by @dependabot in #2644
- Bump golang.org/x/net from 0.4.0 to 0.7.0 in /test/benchmark_bm25 by @dependabot in #2658
Other
- Readme refresh by @databyjp in #2538
- change go imports to new org name by @etiennedi in #2546
- Fix creating files inside new sub-directories of a shard during scaling by @redouan-rhazouani in #2545
- Update to new logo by @bobvanluijt in #2551
- Multishard scaleout e2e tests by @parkerduckworth in #2566
- Use different tokenisation for text and string by @donomii in #2552
- Move traverser.GetParams to types/dto.GetParams by @donomii in #2589
- Fix tests after merge to master by @dirkkul in #2640
- Reduce prometheus metrics by @trengrj in #2605
- Add missing calls to file.Close() by @alexandear in #2655
backup-gcs:
addBACKUP_GCS_USE_AUTH
env var to allow other forms of authentication by @iamcaleberic in #2628- Remove references to contextionary in response from root endpoint by @hsm207 in #2495
- Renaming semi => Weaviate by @dirkkul in #2664
- test: use
t.Setenv
to set env vars by @Juneezee in #2666 - Update: Auto-gen description with time stamp by @srini047 in #2624
- Prevents creating fake.wal by memtable tests by @aliszka in #2675
- API_KEY auth (can be combined with OIDC) by @etiennedi in #2681
- [Performance] Configure dynamic Memtable sizing for inverted index props by @etiennedi in #2669
- Switch queries filtered metrics to summaries by @trengrj in #2693
- Transfer backup files between S3, GCS, and Weaviate without loading their content into memory by @redouan-rhazouani in #2703
- Fix ignored hybrid search limit by @parkerduckworth in #2707
- Fix crash in bm25 by @dirkkul in #2713
- [WIP - GH #2573] Initialise release workflow for building binaries by @h4shk4t in #2650
- Fix results from more than 2 properties by @dirkkul in #2719
- Github Action workflow to add precompiled binaries to a release by @parkerduckworth in #2714
- Update REST documentation URLS for /v1/ endpoint by @dandv in #2698
- Create CITATION.cff by @bobvanluijt in #2687
New Contributors
- @alexandear made their first contribution in #2655
- @iamcaleberic made their first contribution in #2628
- @hsm207 made their first contribution in #2495
- @Juneezee made their first contribution in #2666
- @srini047 made their first contribution in #2624
- @abdelr made their first contribution in #2592
- @h4shk4t made their first contribution in #2650
Full Changelog: v1.17.5...v1.18.0