weaviate/weaviate 0.23.0 on GitHub

Docker image/tag: semitechnologies/weaviate:0.23.0
See also: example docker-compose files in English, Dutch, German, Czech, Italian. If you need to configure additional settings, you can also generate a custom docker-compose.yml file using the documentation.

Breaking Changes

Weaviate Standalone - Removal of third-party database services
The major change in this release is switching to Weaviate's own storing mechanism, replacing all third-party databases services which were required in previous versions. In practice this means, Weaviate no longer has a runtime dependency to Elasticsearch and etcd. Instead all storage operations are taken care of by Weaviate's custom vector-first storage system. It relies on a pluggable vector index. The first (and currently only) vector-storage plugin supported is HNSW. Weaviate does not rely on a third-party HNSW implementation, but instead provides a custom HNSW implementation optimized for real-life database usage. This means it supports all CRUD operations, makes sure any change is always persisted using a Write-Ahead-Commit-Log and performs various ongoing maintenance tasks under the hood to guarantee the health of a long-running database system. All inverted index and object storage operations use a custom Weaviate storage implementation that in turn relies on bolt/bbolt for disk operations.
As a result, Weaviate is now a vector-native search engine. All similarity-based search mechanism (explore concepts query, classifications, etc.) are considerably faster than before. Sub-50ms 20NN-vector queries on datasets of over 1-100M objects are possible. Weaviate relies on a number of caches, but does not require keeping all vectors in memory. Thus it is also possible to run Weaviate on machines where the available memory is smaller than the size of all vectors. For an in-depth look at Weaviate's caching and mem/disk strategies, check out this video.
Upgrading from 0.22.x requires reimporting data
As outlined above, Weaviate now uses a completely different storage mechanism. Thus a live upgrade from 0.22.x is not possible. Instead, all data needs to be reimported into an instance running 0.23.0.
Deprecations removed
The removal of several deprecations was planned for 0.23.0. The following deprecated endpoints or features were removed or changed:
- /v1/c11y/words removed, use /v1/c11y/concepts instead
- ?meta=true on GET requests, use ?include=... instead
- meta property in object body removed, instead use the underscore fields directly, e.g. _classification
- meta field in cross-references removed, instead use the _classification field directly
- cardinality on properties already no longer had an effect in previous releases, but now the field is also removed
- keywords on classes and properties no longer had an effect in in previous releases, but now the fields are also removed
No breaking API changes other than deprecations removal
Other than the above deprecrations - the removal of which had been planned for several versions - there are no breaking API changes in this release. However, future breaking changes are planned for the v1.0.0 release.
Explicitly defined behavior of Like operator
The Like operator was previously offloaded to one of the third-party dependencies, therefor it never had an explicit definition of how it worked. The current design is modelled after wildcards in Elasticsearch, so most likely this change will not break your usage. However, as a matter of precaution we include this as a breaking change. The exact behavior of the Like operator is now defined here.
Explicitly defined behavior of multi-word queries in Equal operator on string and text properties
The behavior for the Equal operator on multi-word string and text properties in where filters was previously offloaded to third-party dependencies and therefore not explicitly defined in Weaviate. As of now, the behavior is defined as follows: Mutliword queries are broken up into single word segments. An object must contain all segements. How words are broken up depends on the datatype. For string properties only spaces defined word boundaries. For text properties all non-alphanumeric properties are considered word boundaries. E.g. for text: my email is alice@example.com is split into ["my", "email", "is", "alice" "example", "com"], whereas the same query string on a string property would be broken into ["my", "email", "is", "alice@example.com"]. No tf-idf weighing is performed on multi-word queries. Instead where filters are meant as pure filters (e.g. it is a binary decision whether an object is included or not), whereas all sorting and ranking can be done with the vector-based explore sorters.
Not horizontally scalable yet
As of 0.23.0 Weaviate is not horizontally scalable yet. It can therefore not be used as a distributed database or in HA-settings yet. However, all internals are designed to support horizontal scalability later on. This is a feature that will be made available in a future release.

weaviate/weaviate 0.23.0 0.23.0 - Weaviate Standalone on GitHub

Breaking Changes

weaviate/weaviate 0.23.0
0.23.0 - Weaviate Standalone

on GitHub