github weaviate/weaviate 0.21.11
0.21.11 - Entity Merging

latest releases: v1.25.0-rc.0, v1.24.10, v1.24.9...
4 years ago

Docker image/tag: semitechnologies/weaviate:0.21.11
See also: example docker compose files in english and dutch.

Breaking Changes

none

New Features

  • Entity Merging (#975)
    Entity merging allows you to deduplicate results. If you have several objects which describe the same physical entity, e.g. "Google Inc." and "Google Incorporated" (they both describe the real-world company "Google"), you can hide duplicates or even let Weaviate merge duplicates into a single entity.

    Usage

    Usage is best described in the following three example screenshots.

    No grouping/merging
    First up is the behavior without any grouping or merging strategy. As you can see there are a lot of duplicates:

    Screenshot 2020-01-16 at 12 36 31

    Grouping strategy closest
    With strategy closest Weaviate tries to build groups based on your results. For each group it will show the results closest to your search query. Note that there is also a force field. The higher the force the more likely Weaviate is going to group two objects together. The force: 1.0 would mean that every single item, no matter how different should be grouped. A force: 0 means that only exactly identical items should be grouped. The example below uses force: 0.1 as that yielded the best results. You can see that no more company names are duplicated:

    Screenshot 2020-01-16 at 12 44 11

    Grouping strategy merge
    The example above hides duplicates. This isn't an issue if every single field is identical. But what if you need to know the original values. Strategy merge will keep the contents of the original fields. String fields contain all original values as shown below, numerical fields display a mean and reference fields contain all the references from all merged objects:

    Screenshot 2020-01-16 at 12 37 15

    Best Practices

    To get the best possible results, please keep the following things in mind:

    • The grouping/merging is done internally based on vector distance. It is thus important that the items to be merged are as close to each other as possible. If your items use a lot of words which are not recognized by the contextionary, those words do not influence the vector position. In this case consider extending the contextionary using the REST API (/c11y/extensions), so that it understands more words from your object
    • You get the best possible results if noise is removed in vectorization, we thus strongly recommend setting vectorizeClassName: false and vectorizePropertyName: false for each property. Those settings were introduced in 0.21.10.

Fixes

none

Don't miss a new weaviate release

NewReleases is sending notifications on new releases.