Docker image/tag: semitechnologies/weaviate:0.21.11
See also: example docker compose files in english and dutch.
Breaking Changes
none
New Features
-
Entity Merging (#975)
Entity merging allows you to deduplicate results. If you have several objects which describe the same physical entity, e.g. "Google Inc." and "Google Incorporated" (they both describe the real-world company "Google"), you can hide duplicates or even let Weaviate merge duplicates into a single entity.Usage
Usage is best described in the following three example screenshots.
No grouping/merging
First up is the behavior without any grouping or merging strategy. As you can see there are a lot of duplicates:Grouping strategy
closest
With strategyclosest
Weaviate tries to build groups based on your results. For each group it will show the results closest to your search query. Note that there is also aforce
field. The higher the force the more likely Weaviate is going to group two objects together. Theforce: 1.0
would mean that every single item, no matter how different should be grouped. Aforce: 0
means that only exactly identical items should be grouped. The example below usesforce: 0.1
as that yielded the best results. You can see that no more company names are duplicated:Grouping strategy
merge
The example above hides duplicates. This isn't an issue if every single field is identical. But what if you need to know the original values. Strategymerge
will keep the contents of the original fields. String fields contain all original values as shown below, numerical fields display a mean and reference fields contain all the references from all merged objects:Best Practices
To get the best possible results, please keep the following things in mind:
- The grouping/merging is done internally based on vector distance. It is thus important that the items to be merged are as close to each other as possible. If your items use a lot of words which are not recognized by the contextionary, those words do not influence the vector position. In this case consider extending the contextionary using the REST API (
/c11y/extensions
), so that it understands more words from your object - You get the best possible results if noise is removed in vectorization, we thus strongly recommend setting
vectorizeClassName: false
andvectorizePropertyName: false
for each property. Those settings were introduced in 0.21.10.
- The grouping/merging is done internally based on vector distance. It is thus important that the items to be merged are as close to each other as possible. If your items use a lot of words which are not recognized by the contextionary, those words do not influence the vector position. In this case consider extending the contextionary using the REST API (
Fixes
none