github weaviate/weaviate 0.22.12
0.22.12 - Underscore Props: _featureProjection

latest releases: v1.23.15, v1.24.13, v1.25.0...
3 years ago

Docker image/tag: semitechnologies/weaviate:0.22.12
See also: example docker compose files in English, German, Dutch, Italian and Czech.

Breaking Changes

none

New Features

  • New Underscore Prop for visualization: _featureProjection (#1178, #1139)
    This release adds a new optional property ("underscore prop") to list results (REST GET /v1/{kinds}, GraphQL Get {}). The feature projection is intended to reduce the dimensionality of the object's vector into something easily suitable for visualizing, such as 2d or 3d. The underlying algorithm is exchangeable, the first algorithm to be provided is t-SNE.

    The feature can be used without any params and tries to pick reasonable defaults. To do so, use the include parameter on REST (GET /v1/{kinds}/?include=_featureProjection) or the _featureProjection { vector }` paramter in GraphQL which appears alongside the schema-defined properties.

    Optional Parameteres

    To tweak the feature projection optional paramaters (currently GraphQL-only) can be provided. The values and their defaults are:

    Parameter Type Default Implication
    dimensions int 2 Target dimensionality, usually 2 or 3
    algorithm string tsne Algorithm to be used, currently supported: tsne
    perplexity int min(5, len(results)-1) The t-SNE perplexity value, must be smaller than the n-1 where n is the number of results to be visualized
    learningRate int 25 The t-SNE learning rate
    iterations int 100 The number of iterations the t-SNE algorithm runs. Higher values lead to more stable results at the cost of a larger response time

    Limitations and Restrictions

    • There is no request size limit (other than the global 10,000 items request limit) which can be used on a _featureProjection query. However, due to the O(n^2) complexity of the t-SNE algorithm, large requests size have an exponential effect on the response time. We recommend to keep the request size at or below 100 items, as we have noticed drastic increases in response time thereafter.
    • Feature Projection happens in real-time, per query. The dimensions returned have no meaning across queries.
    • Currently only root elements (not resolved cross-references) are taken into consideration for the featureProjection.
    • Due to the relatively high cost of the underlying algorithm, we recommend to limit requests including a _featureProjection in high-load situations where response time matters. Avoid parallel requests including a _featureProjection, so that some threads stay available to serve other, time-critical requests.

    Example

    The screenshot below shows a visualization done on a subset of the 20 newsgroup dataset with the article's main category used as label. The chart was created in Python using matplotlib.pyplot's scatter feature.

    image

Fixes

none

Don't miss a new weaviate release

NewReleases is sending notifications on new releases.