Jina 0.4.0

We are excited to release Jina 0.4.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include fallbacks if GPU is unavailable, FaissIndexer on GPU, and switching indexers during querying.

Release 0.4.0

⬆️ Major Features and Improvements

Usability

Add a new value for the on_gpu field. Setting on_gpu: auto in the yaml configure will first check if a GPU device is available and fallback to CPUs when no GPU is found. #617
Improve the accessibility of jina helloworld. We add a CLI argument to enable downloading via the proxy. If you are using a proxy to speed up your internet, try jina helloworld --download-proxy http://127.0.0.1:1087. Just replace the ip and port with your proxy settings. #595
Support to switching between different Indexers during querying. A new argument, ref_indexer, is added for this purpose. With the following yaml config of Indexer, NumpyIndexer is used for indexing and AnnoyIndexer is used for querying. The supported Indexer includes FaissIndexer, AnnoyIndexer, NGTIndexer, NmslibIndexer, SptagIndexer, and NumpyIndexer.
```
!AnnoyIndexer
with:
    ref_indexer:
        !NumpyIndexer
        with:
            index_filename: wrap-npidx
```
#599 #589
Add a new parameter skip-on-error for the Pods. This argument is used to set up on which level you want jina to skip the errors. Check out more details at jina docs #570
```
 !ImageReader
 with:
     skip-on-error: 'EXECUTOR'
```

Scalability

Multiple improvements have been made to speed up the performance.
- Improve the performance of NumpyIndexer. The argsort function is replaced by argpartion, which avoids the unnecessary sorting procedure and speed up the querying process. #641
- Switch to zmqstream for the default message handler, which improves the performance of networking. #618
- Use uvloop from tornado to improve the event handling speed in the Pods. #615

New Executors

Add NGTIndexer. NGT provides high-speed ANN searches for a large volume of data in high dimensional vector data space. #533
```
 !NGTIndexer
 with:
     index_filename: index.gz
     num_threads: 2
     metric: 'l2'
     epsilon: 0.1
```
Add support to running FaissIndexer on GPU and a new argument n_prob for FaissIndexer. Check out more details of the usages at our examples. #636 #638
```
!FaissIndexer
with:
    index_filename: index.gz
    index_key: 'IVF10,PQ4'
    train_filepath: train.gz
    distance: 'l2'
    nprobe: 1
```
Add support for Milvus as a new Indexer. Now you can do indexing and querying with MilvusIndexer. [W.I.P] #651
Add CustomKerasImageEncoder so that you can use your customized model from keras to encode images in jina. The following yaml config loads the model from path/to/your/model and use output of the layer with the name of awesome/encoding/layer as embedding results. #563
```
!CustomKerasImageEncoder
with:
    model_path: path/to/your/model
    layer_name: awesome/encoding/layer
```

Add an argument search_k for AnnoyIndexer. #642

!AnnoyIndexer
with:
    index_filename: index.gz
    metric: 'euclidean'
    n_trees: 10
    search_k: -1

Add FastICAEncoder for encoding. #590

!FastICAEncoder
with:
    output_dim: 32,
    num_features: 128,
    whiten: False,

Documentation

Welcome our evangelist @alexcg1 from New Zealand! He has been working hard on improving document readability, Jina 101, contribution guidelines and README retouches. A new document has been added to guide new contributors. #566

#564
#558
#545

Unit tests

Add the coverage testing. Proudly, Jina's current test coverage is 73.04%. #659

⚠️ Breaking Changes

Rename port_grpc to port_expose. Now we’ve support both gRPC and RESTful APIs and therefore port_grpc does not live up to its name any longer. port_grpc will be deprecated in the future version. #598
Refactor ImageReader to inherit from BaseDocCrafter rather DocSegmenter. In case that you are using ImageReader, check out our examples for more details. #627
Refactor Ranker. The TopKFilterDriver is now used to filter out the chunks that do not belong to the top k documents. This driver is attached to Ranker by default. For DocPbIndexer and DataURIPbIndexer, TopKFilterDriver is removed from the default attachment. With k shards, this will leads to n * k results returned from the indexer when querying. #574
Remove the password_stdin argument for the jina hub CLI. #569

🐞Bug Fixes and Other Changes

Flow

Fix the search_lines API for the Flow #606

Executors

Add a new argument truncation_strategy in BaseTransformerEncoder to adapt the latest Huggingface Transformers v3.0.0. #623

!TransformerTorchEncoder
with:
    pooling_strategy: cls
    model_name: distilbert-base-cased
    max_length: 96
    truncation_strategy: longest_first

Add size property for the indexers. #581

Drivers

Add a new driver UnaryEncoderDriver dedicated for testing and debugging. #635
Fix the problem of PublishDriver. PublishDriver is used to modify the num_parts when the pod is connect to another by the PUB-SUB connection. However, PublishDriver overwrites the original driver of the pod. #569
Remove the if clauses from the Drivers. #646

Protos

Add tags field in the Chunk and Document proto. The tags field is a map of strings and is designed to storage the value of the other fields that will be used for the filtering purpose. #574
Add location field for the Chunks. location is a list of integers. It can be used to mark the position or string, or the coordinates of an image, or the timestamp of an audio clip. #578

Tests

Improve and fix the unit tests. #609 #612 #579 #628

🙏 Thanks to our Contributors

This release contains contributions from hanxiao, JoanFM, nan-wang, fhaase2, anish2197, alexcg1, BingHo1013, shivam-raj, Morriaty-The-Murderer, festeh, generall, emmaadesile, coolmian, JamesTang616, and YueLiu-jina

🙏 Thanks to our Community

And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.

🤝 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.

jina-ai/serve v0.4.0 🎉 v0.4.0 on GitHub