github weaviate/weaviate v1.12.0
v1.12.0 Improved String indexing, Performance and Reliability Fixes

latest releases: v1.24.10, v1.24.9, v1.24.8...
2 years ago

Important: This release may introduce a new bug if you are upgrading from v1.11.0. Please use v1.12.1 instead where this bug has been fixed.

Breaking Changes

none

New Features

  • Index full string field by @aliszka in #1862, #1821
    This new feature allows turning off tokenization for string fields, so that instead of splitting and indexing at the word boundary, the whole field is indexed. This allows for matching a string including spaces, and avoiding undesired partial string matching, such as returning "light grey" when the search was for "grey".

  • Make Inverted Index stopword lists fully configurable by @parkerduckworth in #1870
    This feature introduces a fully configurable stopword list to all inverted-index features. This is in anticipation of BM25 support (and mixed BM25/dense vector search) coming soon, but the feature also applies to exact matches on the inverted index.

  • Unlimited vector search by Certainty by @parkerduckworth in #1883
    Prior to this feature, a vector search with a specified certainty might have cut off too early if the internal limit hit the search first. For example, if the search returned exactly 100 results, but the last result was still within the desired certainty range, there was a chance that there would have been more matches that were not returned. This is especially critical when doing a vector-search-based aggregation (coming soon). This feature allows returning all certainty matches, no matter how many. A global maximum can be configured to prevent a query that matches the whole DB to provoke an OOM situation which would be a potential attack vector.

  • Shard API (Mark shard(s) as read-only) by @parkerduckworth in #1860
    This new feature exposes the status of the individual shards over the API and allows for marking a shard as ready that was previously marked as read-only. When a shard is marked read-only all read queries can continue but write queries are prohibited.

  • Feature/periodically scan disk by @parkerduckworth in #1861
    This new feature is the first to make use of the new shard-status API. There are two new configurable thresholds for disk pressure. If the disk usage exceeds a certain percentage (e.g. 80%) a warning is printed. If the disk pressure continues to rise and a second threshold (e.g. 90%) is crossed, all shards on that particular node will be automatically marked read-only.

Fixes

  • Improve import performance on many-core machines by @etiennedi in #1879
    tl;dr: With this improvement, we have been able to see 20% faster imports on machines with many cores (e.g. 60 cores) while reducing memory spikes. The long version: Please see #1879 for what changes were made internally. Mainly limiting import workers to the amount of available CPU cores and reducing the necessity of locking by copying more memory to a local thread.

  • Fix HNSW commit log issue where the index would be too large after restart or crash by @etiennedi in #1871, #1868
    The compaction process for the HNSW index commit logs was losing some information leading to a situation where the links inside the HNSW graph were appended indefinitely, instead of being replaced. This led to massive index sizes after restarts that degraded performance and lead to unnecessarily large memory usage. This fix makes sure that all information is propagated correctly and indices are identical whether initially built-in memory or rebuilt from commit logs that were individually compacted.

  • Fix broken dynamic ef calculation by @etiennedi in #1880, #1878
    Version v1.9.0 introduced more control over setting ef at runtime. However, it did not work as expected. This commit fixes the values. Those who have never touched the ef setting and were using small limits, will see an improvement in vector search quality with all default ef parameters due to this fix. If you had manually set ef already, this fix has no effect on you.

  • The following internal/non-user-facing fixes were made either to improve reliability or to improve the DX for Weaviate contributors (fix flaky tests, etc):

New Contributors

Full Changelog: v1.11.0...v1.12.0

Don't miss a new weaviate release

NewReleases is sending notifications on new releases.