Docker image/tag: semitechnologies/weaviate:0.23.0
See also: example docker-compose files in English, Dutch, German, Czech, Italian. If you need to configure additional settings, you can also generate a custom docker-compose.yml
file using the documentation.
Breaking Changes
-
Weaviate Standalone - Removal of third-party database services
The major change in this release is switching to Weaviate's own storing mechanism, replacing all third-party databases services which were required in previous versions. In practice this means, Weaviate no longer has a runtime dependency to Elasticsearch and etcd. Instead all storage operations are taken care of by Weaviate's custom vector-first storage system. It relies on a pluggable vector index. The first (and currently only) vector-storage plugin supported is HNSW. Weaviate does not rely on a third-party HNSW implementation, but instead provides a custom HNSW implementation optimized for real-life database usage. This means it supports all CRUD operations, makes sure any change is always persisted using a Write-Ahead-Commit-Log and performs various ongoing maintenance tasks under the hood to guarantee the health of a long-running database system. All inverted index and object storage operations use a custom Weaviate storage implementation that in turn relies on bolt/bbolt for disk operations.As a result, Weaviate is now a vector-native search engine. All similarity-based search mechanism (explore concepts query, classifications, etc.) are considerably faster than before. Sub-50ms 20NN-vector queries on datasets of over 1-100M objects are possible. Weaviate relies on a number of caches, but does not require keeping all vectors in memory. Thus it is also possible to run Weaviate on machines where the available memory is smaller than the size of all vectors. For an in-depth look at Weaviate's caching and mem/disk strategies, check out this video.
-
Upgrading from
0.22.x
requires reimporting data
As outlined above, Weaviate now uses a completely different storage mechanism. Thus a live upgrade from0.22.x
is not possible. Instead, all data needs to be reimported into an instance running0.23.0
. -
Deprecations removed
The removal of several deprecations was planned for0.23.0
. The following deprecated endpoints or features were removed or changed:/v1/c11y/words
removed, use/v1/c11y/concepts
instead?meta=true
on GET requests, use?include=...
insteadmeta
property in object body removed, instead use the underscore fields directly, e.g._classification
meta
field in cross-references removed, instead use the_classification
field directlycardinality
on properties already no longer had an effect in previous releases, but now the field is also removedkeywords
on classes and properties no longer had an effect in in previous releases, but now the fields are also removed
-
No breaking API changes other than deprecations removal
Other than the above deprecrations - the removal of which had been planned for several versions - there are no breaking API changes in this release. However, future breaking changes are planned for thev1.0.0
release. -
Explicitly defined behavior of
Like
operator
TheLike
operator was previously offloaded to one of the third-party dependencies, therefor it never had an explicit definition of how it worked. The current design is modelled after wildcards in Elasticsearch, so most likely this change will not break your usage. However, as a matter of precaution we include this as a breaking change. The exact behavior of theLike
operator is now defined here. -
Explicitly defined behavior of multi-word queries in
Equal
operator onstring
andtext
properties
The behavior for theEqual
operator on multi-wordstring
andtext
properties inwhere
filters was previously offloaded to third-party dependencies and therefore not explicitly defined in Weaviate. As of now, the behavior is defined as follows: Mutliword queries are broken up into single word segments. An object must contain all segements. How words are broken up depends on the datatype. Forstring
properties only spaces defined word boundaries. Fortext
properties all non-alphanumeric properties are considered word boundaries. E.g. for text:my email is alice@example.com
is split into["my", "email", "is", "alice" "example", "com"]
, whereas the same query string on astring
property would be broken into["my", "email", "is", "alice@example.com"]
. No tf-idf weighing is performed on multi-word queries. Instead where filters are meant as pure filters (e.g. it is a binary decision whether an object is included or not), whereas all sorting and ranking can be done with the vector-basedexplore
sorters. -
Not horizontally scalable yet
As of0.23.0
Weaviate is not horizontally scalable yet. It can therefore not be used as a distributed database or in HA-settings yet. However, all internals are designed to support horizontal scalability later on. This is a feature that will be made available in a future release.