Breaking Changes
none
New Features
High Frequency Updates
Weaviate now has reinforced support for high-frequency updates, enabling tens of millions of updates per day without degrading performance. This improvement involved optimizing the handling of object properties and vector updates. Specifically, updates that do not affect the vector index avoid unnecessary processing, and updates that change vectors are processed much more efficiently. These enhancements are critical for maintaining high performance and scalability in environments with heavy update loads.
- Skip vector reindex by @aliszka in #3963
- Skip re-vectorization of identical/similar objects in a batch by @aliszka in #4163
- Idempotent batch by @aliszka in #4058
- Locks: write-only sharded locks by @asdine in #4156
- Improved cleaning of tombstones by @abdelr in #4003
HNSW Binary Quantization
Introducing binary quantization support for the HNSW (Hierarchical Navigable Small World) vector index, marking a significant advancement in the efficiency and performance of vector search operations. This new feature enables the conversion of vectors into compact binary formats, drastically reducing the memory footprint of the vector index while maintaining high search accuracy. The introduction of binary quantization support is a game-changer for applications with large-scale datasets, where memory efficiency is paramount. By leveraging binary quantization, you can now achieve faster search speeds and lower memory usage, making it easier to scale applications without compromising on performance.
- HNSW+BQ by @abdelr in #3841
- Enabling BQ compression through update user config by @abdelr in #3875
- HNSW BQ dimension tracking by @trengrj in #4209
Support for Multiple Vectors per Class
Hey Weaviate community 👋 we heard you! The most upvoted feature on the public roadmap is here!
Support for multiple vectors per class means enhancing Weaviate's flexibility and applicability for complex data models. This feature allows for more nuanced data representation, supporting diverse and multifaceted search and machine learning use cases. By accommodating multiple vectors per class, Weaviate enables richer data indexing and retrieval strategies, improving accuracy and efficiency in search queries and data analysis tasks.
- Introduce class multi-vector support by @antas-marcin in #4180
- Multiple Vectors: gRPC Batch API vectors by @antas-marcin in #4246
- Multiple Vectors: Validation of none vectorizer modules when multiple vectorizer configuration is present by @antas-marcin in #4248
- Return all named vectors if vectors is true by @dirkkul in #4253
- Aggregate with named vectors by @dirkkul in #4254
- Support for VectorConfig update by @aliszka in #4266
- Summed metrics for target vectors by @aliszka in #4264
- Target vectors marshalling by @aliszka in #4271
- Use Async Queue(s) and VectorIndex(es) only relevant for configured legacy vector or target vectors by @aliszka in #4279
- Hide legacy vector index config when target vectors configured by @aliszka in #4283
- Handle named vectors when checking certainy/distance in queries by @tsmith023 in #4285
Japanese & Chinese Tokenizers
- Tokenizers for Japanese/Chinese by @donomii in #4028
- Include dictionary files in docker image, add test by @donomii in #4158
Durability Improvements
- Use remainder to ensure last batch is indexed by @amourao in #4063
- Prevent segment inconsistencies by @jeroiraz in #4129
- Backward-compatibility wal parser by @jeroiraz in #4225
- Delete already compacted segment by @jeroiraz in #4307
- Delete compacted segment metadata by @jeroiraz in #4309
Improved NotEqual Operator
- Improved
NotEqual
Operator by @parkerduckworth in #4122 - Truncate buffered doc IDs from Searcher by @parkerduckworth in #4275
Telemetry
- Minimal telemetry in OSS Weaviate by @parkerduckworth in #4097
- Telemetry: updates and safeguards by @parkerduckworth in #4294
- Telemetry: disable for development and CI pipeline by @moogacs in #4218
Other
- Generic backoff timer to limit any behavior, currently log messages by @parkerduckworth in #3912
- Extend HTTP backup & restore endpoints to allow for custom compression configuration by @redouan-rhazouani in #3367
- Add restore config object by @moogacs in #4026
- Cleanup scores, search results and DocID by @donomii in #3849
- Add Go package reference to README by @moogacs in #4111
- Setting default values for weaviate server by @avirlrma in #4106
- Add nil props to GRPC map by @dirkkul in #4135
- Switch hybrid fusion default to relative score fusion by @dirkkul in #3939
- Reduce logging message frequency for non-recovered objects by @amourao in #4242
- Ignore logging only, not the continue by @amourao in #4243
- Disable logging in inconsistent object bucket and search bucket entries (1.24) by @amourao in #4261
- Enable setting additional log levels by @amourao in #4215
Fixes
- Resolve race condition in acceptance stress tests by @redouan-rhazouani in #3938
- Backup assign defaults on paritals config by @moogacs in #4020
- Acceptance test command path and test acceptance on test all by @moogacs in #4043
- Typo in replication acceptance test in test/run.sh by @moogacs in #4044
- Remove unwanted +1 by @donomii in #4127
- Qna-transformer acceptance test runners by adding larger runners by @moogacs in #4138
- Compress at the right time by @abdelr in #4238
- Fix checkpoint detection in async mode by @asdine in #4277
- Batch Delete: Don't fail entire batch when id not found by @parkerduckworth in #4284
- Refactoring compressor by @abdelr in #3880
Performance Improvements
- Batch Vectorizer by @dirkkul in #4147
- Better BM25 benchmarking by @amourao in #4078
- Improve locking of shared pool by @dirkkul in #4258
Testing Improvements
- Cleanup gha caches after PR merge by @moogacs in #4162
- Move transformers acceptance tests to ubuntu-latest-4-cores by @moogacs in #4173
- Assert eventually gcs backup bucket creation by @moogacs in #4176
- Assert eventually cloud backup bucket creation by @moogacs in #4178
- Fix flaky test with named vectors return order by @dirkkul in #4255
- Add backup tests with named vectors by @dirkkul in #4281
- Improved class update tests by @aliszka in #4286
- Migration of compression tests to acceptance package by @aliszka in #3923
- Speed acceptance tests and gha by modularizing the running test and parallelize them by @moogacs in #4017
- Change module acceptance tests to run on ubuntu-latest by @antas-marcin in #4045
- Add debug weaviate build and related script to use remote delve by @reyreaud-l in #4051
New Contributors
Full Changelog: v1.23.10...v1.24.0