Jina v0.7.0
We are excited to release Jina v0.7.0. Jina is an easier way to do a neural search on the cloud. Highlights of this release include:
Flow
evaluation support- Support for preventing duplicates Documents in the index
Flow
visualization support
Release v0.7.0
β¬οΈ Major Features and Improvements
Completeness
- Evaluation is fully supported by Jina.
jina.executors.evaluators
andjina.drivers.evaluate
have been introduced to make this happen. Now you can use different metrics to evaluate the Flow. No matter whether you want to evaluate the whole Flow or just part of it, the evaluation can be done smoothly without stopping the running Flow. #1043, #1086, #1087, #1090, #1092, #1099, #1100, #1102, #1114, #1134
Click here to see the example codes
code | index-doc.yml | eval.yml |
from jina.flow import Flow
from jina.proto import jina_pb2
from jina.drivers.helper import array2pb
import numpy as np
def get_index_docs():
doc0 = jina_pb2.Document()
doc0.tags['id'] = '0'
doc0.embedding.CopyFrom(array2pb(np.array([1, 1])))
doc1 = jina_pb2.Document()
doc1.tags['id'] = '1'
doc1.embedding.CopyFrom(array2pb(np.array([1, -1])))
return [doc0, doc1]
# indexed two docs
f_index = (Flow().add(uses='index-doc.yml'))
with f_index:
f_index.index(input_fn=get_index_docs)
def get_eval_docs():
doc = jina_pb2.Document()
doc.embedding.CopyFrom(array2pb(np.array([1, 1])))
groundtruth = jina_pb2.Document()
match0 = groundtruth.matches.add()
match0.tags['id'] = '0'
match1 = groundtruth.matches.add()
match1.tags['id'] = '2'
return [(doc, groundtruth), ]
def validate(resp):
# retrieved docs with id `0` and `1`
# relevant docs with id `0` and `2`
# Precision@2 = 0.5
assert resp.docs[0].evaluations[0].value == 0.5
# evaluate Precision@2
f_eval = (Flow()
.add(uses='index-doc.yml')
.add(uses='eval.yml'))
with f_eval:
f_eval.search(
input_fn=get_eval_docs,
output_fn=validate,
callback_on_body=True)
|
!CompoundIndexer
components:
- !NumpyIndexer
metas:
name: vecidx
- !BinaryPbIndexer
metas:
name: docidx
requests:
on:
IndexRequest:
- !VectorIndexDriver
with:
executor: vecidx
traversal_paths: ['r']
- !KVIndexDriver
with:
executor: docidx
traversal_paths: ['r']
SearchRequest:
- !VectorSearchDriver
with:
executor: vecidx
traversal_paths: ['r']
- !KVSearchDriver
with:
executor: docidx
traversal_paths: ['m']
|
!PrecisionEvaluator
with:
eval_at: 2
id_tag: 'id' |
- To prevent duplicates in the index,
UniquePbIndexer
andUniqueVectorIndexer
are introduced together with the corresponding drivers injina.drivers.cache
. Please refer to docs.jina.ai for more details. #1064, #1081, #1147
Click here to see the example codes
from jina.flow import Flow
from jina.proto import jina_pb2
doc_0 = jina_pb2.Document()
doc_0.text = f'I am doc0'
doc_1 = jina_pb2.Document()
doc_1.text = f'I am doc1'
def assert_num_docs(rsp, num_docs):
assert len(rsp.IndexRequest.docs) == num_docs
f = Flow().add(
uses='NumpyIndexer', uses_before='_unique')
with f:
f.index(
[doc_0, doc_0, doc_1],
output_fn=lambda rsp: assert_num_docs(rsp, num_docs=2))
Usability
- Add visualization for Flow. Calling
plot()
function ofFlow
gives a better view of how the Flow looks. #1002, #1116
β οΈ Breaking Changes
-
Document.id
,Document.parent_id
andRelevance.ref_id
are nowstring
types instead ofint
. Please refer to docs.jina.ai for more details. #1005, #1034, #1136 Accordingly, the following changes are made,SortQL.field
now usesdunder_get
syntax rather than.
expansion (e.g.a.b.c -> a__b__c
,score.value -> score__value
) and now supportsdict
andlist
access.first_doc_id
,random_doc_id
andoverride_doc_id
have been removed from CLI.
-
Refactor logger config into YAML. Add
--log-config
tojina pea
CLI, by default it points tologging.default.yml
.--log-sse
,--log-profile
,--log-with-own-name
are deprecated. #1031
Click here to check how the loggers are mapped to different resource files:
Filename | Logger in the code |
---|---|
logging.default.yml | default_logger and any logger defined with JinaLogger()
|
logging.docker.yml | logger used in the ContainerPea
|
logging.profile.yml | profile_logger
|
logging.remote.yml | logger used in the RemotePea
|
-
Refactor the codes for traversing recursive Documents. Replaced by
traversal_paths
,granularity_range
,adjacency_range
,recur_on
andrecursion_order
are deprecated. This allows us to specify where the traversal should happen in an exact way. #995, #998, #1001, #1003, #1006, #1007, #1027, #1036, #1044 -
Protobuf
request_id
is nowstring
type.--first-request-id
removed from client CLI.--query-uses
and--index-uses
from hello-world CLI now renamed to--uses-query
and--uses-index
. #1049
π Bug Fixes and Other Changes
Flow
- Refactor log stream server with
fluentd
. Flunetd acts as a daemon collecting logs from different parts of Jina and forwarding them to a specific output. Check out more details at docs.jina.ai #1002, #999 - Add
ordinal_idx_arg
for batching decorator to support passing ordinal index to indexers #1089 - Refactor
request_id
to uuid #1049 - Refactor logger wrapper #1029
- Add ssh tunneling for Pod. You can specify ssh information #1018
- Switch to hash function for generating ids #1005, #1034
- Support to use
--uses-before
and--uses-after
when--parallel=1
. Both options only act on whenparallel > 1
._pass
and_forward
are usingRouteDriver
by default. #1112 - Rename
replica_id
topea_id
and fix thePeaRoleType
#1015 - Fix the bug in setting
top_k
#1133 #1138 #1145
Executors
- Add checking for the existence of model paths #1077
- Improve exception handling for the failure of loading pre-trained models #1065
- Fix typing of indexers #1053
- Fix the no attribute error for BaseOnnxEncoder #1107
Drivers
- Fix bug in
QueryDriver
when passing dictionary argument. #1080
CLI
-
Improve the hubio module.
jina hub login
supports to login with the OAuth authentification.jina hub list
is for list the available pods in the jina-hub.jina hub push
support to build and push the pod images via Hubapi deployed on AWS API Gateway #1022, #1041, #1118, #1120, #1135 -
Add the update checking for jina cli #1117
Tests & CICD
- Refactor test for Python client #1095
- Add tests for including examples during ci #1088
- Fix dependency conflicts in ci by replacing
[match-py-ver]
with[cicd]
#1101 - Improve PR review process by adding
CODEOWNERS
#1108 - Refactor to pytest in testing
request
#1045 - Add unit test for helper #1046
- Fix io test #1052
- Fix test coverage #1054, #1056
- Use
pytest
fixture to removetmp
files #1021 - Refactor the unit tests to
pytest
style intest_protobuf
#1121 - Add docker helper test #1115
- Add test in the ci for testing examples #1142
- Add test in the ci for testing hello-world in docker with no devel installed #1139
Documentation
- Add Portuguese translation for
README
#1097 - Add Ukrainian translation for
README.md
#1124 - Fix Russian
README
#1057 - Fix broken links in
README
#1033, #1037, #105 - Fix links in
CHANGELOG
andCONTRIBUTING
#1032 - Improve the docstring for rank drivers #1143
Others
- Fix duplicate lines in cookiecutter #1063
- Fix conflicts between copyright adding action and typing #1023
- Move
numpy
importing inside function #1019 - Rename
jina_cli
tocli
#1017 - Fix typing error in
mypy
#1009 - Fix line spaces in code #1105
π Thanks to our Contributors
This release contains contributions from Alex C-G, Alex McKenzie, CatStark, Christopher Lennan, Deepankar Mahapatro, Fernanda Kawasaki, Han Xiao, Joan Fontanals Martinez, JΓ‘n JendruΕ‘Γ‘k, Maximilian Werk, Nan Wang, Oleh Yaroshchuk, Pratik Bhavsar, RenrakuRunrat, Rutuja Surve, Sai Sandeep Mutyala, Sergei Averkiev, Susana Guzman, Wang Bo, jancijen, pswu11
π Thanks to our Community
And thanks to all of you out there as well! Without you, Jina couldn't do what we do. Your support means a lot to us.
π€ Work with Jina
Want to work with Jina full-time? Check out our openings on our website.