weaviate/weaviate 0.22.12 on GitHub

Docker image/tag: semitechnologies/weaviate:0.22.12
See also: example docker compose files in English, German, Dutch, Italian and Czech.

Breaking Changes

none

New Features

New Underscore Prop for visualization: _featureProjection (#1178, #1139)
This release adds a new optional property ("underscore prop") to list results (REST GET /v1/{kinds}, GraphQL Get {}). The feature projection is intended to reduce the dimensionality of the object's vector into something easily suitable for visualizing, such as 2d or 3d. The underlying algorithm is exchangeable, the first algorithm to be provided is t-SNE.

The feature can be used without any params and tries to pick reasonable defaults. To do so, use the include parameter on REST (GET /v1/{kinds}/?include=_featureProjection) or the _featureProjection { vector }` paramter in GraphQL which appears alongside the schema-defined properties.

Optional Parameteres

To tweak the feature projection optional paramaters (currently GraphQL-only) can be provided. The values and their defaults are:

Parameter	Type	Default	Implication
`dimensions`	`int`	`2`	Target dimensionality, usually `2` or `3`
`algorithm`	`string`	`tsne`	Algorithm to be used, currently supported: `tsne`
`perplexity`	`int`	`min(5, len(results)-1)`	The `t-SNE` perplexity value, must be smaller than the `n-1` where `n` is the number of results to be visualized
`learningRate`	`int`	`25`	The `t-SNE` learning rate
`iterations`	`int`	`100`	The number of iterations the `t-SNE` algorithm runs. Higher values lead to more stable results at the cost of a larger response time

Limitations and Restrictions

There is no request size limit (other than the global 10,000 items request limit) which can be used on a _featureProjection query. However, due to the O(n^2) complexity of the t-SNE algorithm, large requests size have an exponential effect on the response time. We recommend to keep the request size at or below 100 items, as we have noticed drastic increases in response time thereafter.
Feature Projection happens in real-time, per query. The dimensions returned have no meaning across queries.
Currently only root elements (not resolved cross-references) are taken into consideration for the featureProjection.
Due to the relatively high cost of the underlying algorithm, we recommend to limit requests including a _featureProjection in high-load situations where response time matters. Avoid parallel requests including a _featureProjection, so that some threads stay available to serve other, time-critical requests.

Example

The screenshot below shows a visualization done on a subset of the 20 newsgroup dataset with the article's main category used as label. The chart was created in Python using matplotlib.pyplot's scatter feature.

Fixes