Docker image/tag: semitechnologies/weaviate:0.21.10
See also: example docker compose files in english and dutch.
Breaking Changes
New Features
-
More user control over what is used in vectorization (#1062)
Motivation
Image a
Fruit
class with a propertyname
of typestring
. Let's add an object withname: Pineapple
. Prior to this release the vector representating this object in the vector space would always be formed from the class name, the property names and the property values. Thus, the vector representation of the above object would be formed from the text corpusfruit name pineapple
. This helps in search, as a lot of context is added, but in some cases there is a lot of redundant information. Let's add another fruit withname: tomato
. If both objects were part of a classification (or deduplication process with the upcoming "Entity Merging" feature) their vector position would be quite similar, as two out of the three words used to form the vector are identical:fruit name pineapple
vs.fruit name tomato
. Additionally, one can argue thatname
does not add any semantic value in there. We have thus decided to provide the user with more control over vectorization.How to use
Two new fields were added at the
schema/{things,actions}
level:class: Fruit vectorizeClassName: true # <-- newly introduced in this release properties: - name: name dataType: ["string"] vectorizePropertyName: false # <-- newly introduced in this release
Default values
If not explicitly set,
vectorizeClassName
will default totrue
, whereasvectorizePropertyName
will default tofalse
. To understand the motivation behind these defaults, see the next sectionNew/Updated Validation Requirements
- If you choose to vectorize a class name (default: true), the class name must be contextionary-valid (Prior to this change the class name was always vectorized, so it was already required to be contextionary-valid, therefore this is not a breaking change)
- If you choose not to vectorize a class name, it does not have to be contextionary-valid.
- If you choose to vectorize a property name (default: false), the class name must be contextionary-valid (Prior to this change the property name was always vectorized, so it was already required to be contextionary-valid, therefore this is not a breaking change)
- If you choose not to vectorize a property name, it does not have to be contextionary-valid.
- If you choose to vectorize neither the class name, nor any property names there is a chance that some imports will fail: Weaviate needs to be able to extract at least one contextionary-valid word from every object, so it can build a vector position for this object. If you have a a class which only has numerical props or only use non-contextionary-valid values in your string/text props, Weaviate must rely on the class/property names to extract at least one word. However, if you additionally choose to disable vectorization for both class and property names, Weaviate might end up not being able to extract any words. In this case importing will fail.
Some example error cases and how to fix them
Example 1
class:
class: aqfaowiefj # random word, not contextionary- properties: - name: name dataType: ["string"]
Error: class name is not contained in contextionary
Solutions:- Either use a contextionary-valid class name
- Or set
vectorizeClassName: false
Example 2
class:
class: Fruit # contextionary-valid word vectorizeClassName: false # explicitly disable properties: - name: name dataType: ["string"]
Note: The class above is valid, but whether importing fails or succeeds, depends on the object:
object:
class: Fruit schema: name: aoiapwueog # random, non-contextionary-valid word
Error: no-contextionary-valid words extracted from object corpus
Solutions:- Since you are already using a contextionary-valid class name, you could set
vectorizeClassName: true
- Alternatively, you need to make sure that
name
fields contains at least contextionary-valid word, e.g.name: aoiapwueog apple
Dependencies
- Make sure that you are using at contextionary version
...v0.4.5
, to benefit from more precise error messages when vectorization fails due to incorrect input
The docker-compose files linked in the top of the release notes have already been adapted to point to the correct version.