github jina-ai/jina v0.7.0
πŸŽ‰ release v0.7.0

latest releases: v3.27.16, v3.27.15, v3.27.14...
3 years ago

Jina v0.7.0

We are excited to release Jina v0.7.0. Jina is an easier way to do a neural search on the cloud. Highlights of this release include:

  • Flow evaluation support
  • Support for preventing duplicates Documents in the index
  • Flow visualization support

Release v0.7.0

⬆️ Major Features and Improvements

Completeness

  • Evaluation is fully supported by Jina. jina.executors.evaluators and jina.drivers.evaluate have been introduced to make this happen. Now you can use different metrics to evaluate the Flow. No matter whether you want to evaluate the whole Flow or just part of it, the evaluation can be done smoothly without stopping the running Flow. #1043, #1086, #1087, #1090, #1092, #1099, #1100, #1102, #1114, #1134
Click here to see the example codes

code index-doc.yml eval.yml
from jina.flow import Flow
from jina.proto import jina_pb2
from jina.drivers.helper import array2pb
import numpy as np

def get_index_docs():
    doc0 = jina_pb2.Document()
    doc0.tags['id'] = '0'
    doc0.embedding.CopyFrom(array2pb(np.array([1, 1])))
    doc1 = jina_pb2.Document()
    doc1.tags['id'] = '1'
    doc1.embedding.CopyFrom(array2pb(np.array([1, -1])))
    return [doc0, doc1]

# indexed two docs
f_index = (Flow().add(uses='index-doc.yml'))
with f_index:
    f_index.index(input_fn=get_index_docs)


def get_eval_docs():
    doc = jina_pb2.Document()
    doc.embedding.CopyFrom(array2pb(np.array([1, 1])))
    groundtruth = jina_pb2.Document()
    match0 = groundtruth.matches.add()
    match0.tags['id'] = '0'
    match1 = groundtruth.matches.add()
    match1.tags['id'] = '2'
    return [(doc, groundtruth), ]

def validate(resp):
    # retrieved docs with id `0` and `1`
    # relevant docs with id `0` and `2`
    # Precision@2 = 0.5
    assert resp.docs[0].evaluations[0].value == 0.5

# evaluate Precision@2
f_eval = (Flow()
          .add(uses='index-doc.yml')
          .add(uses='eval.yml'))
with f_eval:
    f_eval.search(
        input_fn=get_eval_docs, 
        output_fn=validate, 
        callback_on_body=True)

!CompoundIndexer
components:
  - !NumpyIndexer
    metas:
      name: vecidx
  - !BinaryPbIndexer
    metas:
      name: docidx
requests:
  on:
    IndexRequest:
      - !VectorIndexDriver
        with:
          executor: vecidx
          traversal_paths: ['r']
      - !KVIndexDriver
        with:
          executor: docidx
          traversal_paths: ['r']
    SearchRequest:
      - !VectorSearchDriver
        with:
          executor: vecidx
          traversal_paths: ['r']
      - !KVSearchDriver
        with:
          executor: docidx
          traversal_paths: ['m']

!PrecisionEvaluator
with:
    eval_at: 2
    id_tag: 'id'

  • To prevent duplicates in the index, UniquePbIndexer and UniqueVectorIndexer are introduced together with the corresponding drivers in jina.drivers.cache. Please refer to docs.jina.ai for more details. #1064, #1081, #1147
Click here to see the example codes
from jina.flow import Flow
from jina.proto import jina_pb2

doc_0 = jina_pb2.Document()
doc_0.text = f'I am doc0'
doc_1 = jina_pb2.Document()
doc_1.text = f'I am doc1'


def assert_num_docs(rsp, num_docs):
    assert len(rsp.IndexRequest.docs) == num_docs

f = Flow().add(
    uses='NumpyIndexer', uses_before='_unique')

with f:
    f.index(
        [doc_0, doc_0, doc_1], 
        output_fn=lambda rsp: assert_num_docs(rsp, num_docs=2))

Usability

  • Add visualization for Flow. Calling plot() function of Flow gives a better view of how the Flow looks. #1002, #1116
Click here to see the example codes flow_visualize

⚠️ Breaking Changes

  • Document.id, Document.parent_id and Relevance.ref_id are now string types instead of int. Please refer to docs.jina.ai for more details. #1005, #1034, #1136 Accordingly, the following changes are made,

    • SortQL.field now uses dunder_get syntax rather than . expansion (e.g. a.b.c -> a__b__c, score.value -> score__value) and now supports dict and list access.
    • first_doc_id, random_doc_id and override_doc_id have been removed from CLI.
  • Refactor logger config into YAML. Add --log-config to jina pea CLI, by default it points to logging.default.yml. --log-sse, --log-profile, --log-with-own-name are deprecated. #1031

Click here to check how the loggers are mapped to different resource files:
Filename Logger in the code
logging.default.yml default_logger and any logger defined with JinaLogger()
logging.docker.yml logger used in the ContainerPea
logging.profile.yml profile_logger
logging.remote.yml logger used in the RemotePea
  • Refactor the codes for traversing recursive Documents. Replaced by traversal_paths, granularity_range, adjacency_range, recur_on and recursion_order are deprecated. This allows us to specify where the traversal should happen in an exact way. #995, #998, #1001, #1003, #1006, #1007, #1027, #1036, #1044

  • Protobuf request_id is now string type. --first-request-id removed from client CLI. --query-uses and --index-uses from hello-world CLI now renamed to --uses-query and --uses-index. #1049

🐞 Bug Fixes and Other Changes

Flow

  • Refactor log stream server with fluentd. Flunetd acts as a daemon collecting logs from different parts of Jina and forwarding them to a specific output. Check out more details at docs.jina.ai #1002, #999
  • Add ordinal_idx_arg for batching decorator to support passing ordinal index to indexers #1089
  • Refactor request_id to uuid #1049
  • Refactor logger wrapper #1029
  • Add ssh tunneling for Pod. You can specify ssh information #1018
  • Switch to hash function for generating ids #1005, #1034
  • Support to use --uses-before and --uses-after when --parallel=1. Both options only act on when parallel > 1. _pass and _forward are using RouteDriver by default. #1112
  • Rename replica_id to pea_id and fix the PeaRoleType #1015
  • Fix the bug in setting top_k #1133 #1138 #1145

Executors

  • Add checking for the existence of model paths #1077
  • Improve exception handling for the failure of loading pre-trained models #1065
  • Fix typing of indexers #1053
  • Fix the no attribute error for BaseOnnxEncoder #1107

Drivers

  • Fix bug in QueryDriver when passing dictionary argument. #1080

CLI

  • Improve the hubio module. jina hub login supports to login with the OAuth authentification. jina hub list is for list the available pods in the jina-hub. jina hub push support to build and push the pod images via Hubapi deployed on AWS API Gateway #1022, #1041, #1118, #1120, #1135

  • Add the update checking for jina cli #1117

Tests & CICD

  • Refactor test for Python client #1095
  • Add tests for including examples during ci #1088
  • Fix dependency conflicts in ci by replacing [match-py-ver] with [cicd] #1101
  • Improve PR review process by adding CODEOWNERS #1108
  • Refactor to pytest in testing request #1045
  • Add unit test for helper #1046
  • Fix io test #1052
  • Fix test coverage #1054, #1056
  • Use pytest fixture to remove tmp files #1021
  • Refactor the unit tests to pytest style in test_protobuf #1121
  • Add docker helper test #1115
  • Add test in the ci for testing examples #1142
  • Add test in the ci for testing hello-world in docker with no devel installed #1139

Documentation

  • Add Portuguese translation for README #1097
  • Add Ukrainian translation for README.md #1124
  • Fix Russian README #1057
  • Fix broken links in README #1033, #1037, #105
  • Fix links in CHANGELOG and CONTRIBUTING #1032
  • Improve the docstring for rank drivers #1143

Others

  • Fix duplicate lines in cookiecutter #1063
  • Fix conflicts between copyright adding action and typing #1023
  • Move numpy importing inside function #1019
  • Rename jina_cli to cli #1017
  • Fix typing error in mypy #1009
  • Fix line spaces in code #1105

πŸ™ Thanks to our Contributors

This release contains contributions from Alex C-G, Alex McKenzie, CatStark, Christopher Lennan, Deepankar Mahapatro, Fernanda Kawasaki, Han Xiao, Joan Fontanals Martinez, JΓ‘n JendruΕ‘Γ‘k, Maximilian Werk, Nan Wang, Oleh Yaroshchuk, Pratik Bhavsar, RenrakuRunrat, Rutuja Surve, Sai Sandeep Mutyala, Sergei Averkiev, Susana Guzman, Wang Bo, jancijen, pswu11

πŸ™ Thanks to our Community

And thanks to all of you out there as well! Without you, Jina couldn't do what we do. Your support means a lot to us.

🀝 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.

Don't miss a new jina release

NewReleases is sending notifications on new releases.