Jina 1.2
We are excited to release Jina 1.2. Jina is the easier way to do neural search in the cloud. Highlights of this release include:
- Improve the performance when handling sparse embeddings.
- Add support to Hugging Faces π€ API
- Add support to spell checking
Release 1.2
β¬οΈ Major Features and Improvements
Build your search system with sparse embeddings
Here at Jina, our primary goal is to develop a universal framework to support all your neural search use cases. From Jina 1.2 onwards, you can create a neural search application with s p a r s e embeddings (see what I did there?). This is especially handy in certain use cases like product catalogs which are normally encoded in a one-hot vector format. If you are interested in deploying a sparse vector app check out our documentation guide here. The related pull requests can be found here #2207, #2233, #2239, #2240, #2271, #2296, #2297, #2309, #2316.
100x performance gain for encoding your data with Hugging Facesπ€ ' API ποΈ
Every machine learning engineer knows the pain of lying awake at night worrying if their data is slowly being encoded on their laptop nearby. Make this experience a distant memory of the past (like those days when you could hug your friends), and check out Hugging Face new inference API! They've done some fascinating work on speeding up inference from models in the transformers library. You can now benefit from this performance gain by using the TransformerTorchEncoder hub module in your Jina Flow and plugging in your Hugging Face API key. Check out the details here.
Handle misspelling in search queries
We all know that computers can be a little picky. We humans would know that Jan Solo is just a misspelling for the famous Star Wars character Han Solo. Being able to handle these misspelling queries is a complex topic. This release includes a basic solution for this problem. You can now implement a crafter executor which will train a machine learning auto-correction model on your corpus to handle simple misspellings. Find out more PySpellChecker and jina-hub/crafters/nlp/SpellChecker.
β οΈ Breaking Changes
- Combine
batching_multi_input
decorator intobatching
. #2269 - Make the IDs of Peas start at index
0
. #2243 - Improve the APIs on the executors level. #2313
π Documentation
- Fix the typos in the script of README.md #2256 @carlosb1
- Reformat README.md #2244, #2264, #2270, #2314
- Add docstrings for
jina/enums.py
. #2274 - Improve docstrings for
BasePod
. #2282 - Improve the code snippets #2308, #2325
π Bug Fixes and Other Changes
Flow
- Add
CANCEL command to
ControlRequestProto` in order to remove the dealer from the router. #2257 - Add
reload
API to Flow. #2278, #2280, #2285 - Improve the flaky logging when creating a Flow. #2279 @mohamed--abdel-maksoud
- Fix the wrong assignment to
sibling
. #2300 - Add
reload
API to the RESTful APIs. #2301 - Fix the flushing issue when a Flow is interrupted by
KeyboardInterrupt
. #2353
Executors
- Refactor the evaluator's name #1570
- Remove the deprecated codes related to training #2311
- Improve the usability of
CompoundPod
. #2329 - Add
PodFactory
for abstracting the Pod construction. #2346 - [Experimental] Split the indexer into dump indexers and query indexers. Introduce
BaseDBMSIndexer
,BinaryPbDBMSIndexer
,KeyValueDBMSIndexer
as dump indexers. IntroduceBaseQueryIndexer
,NumpyQueryIndexer
,BinaryPbQueryIndexer
,BinaryPbQueryIndexer
as query indexers. #2260, #2310, #2312, #2307 - [Experimental] Introduce replicas. #2224, #2338
- [Experimental] Improve the APIs on the executors level. This refactoring greatly improves the usability of Jina when users want to implement customized executors. Check out more details at #2313, #2317, #2327, #2351
Tests
- Adapt the unit tests to the latest RESTful APIs #2251
- Add unit tests for
plot
function #2245 - Replace the
DocumentProto
withDocument
intest_eval_collect_driver.py
#2276 - Add unit tests for
zmqlet
#2361 @winstonww
Others
- Remove unnecessary codes in
BaseAggregateMatchesRankerDriver
#2258 - Improve the
get_public_ip
function by using multithreads #2267, #2272, #2277 - Add
WhooshDriver
for celebrating April Fools' Day. π #2273 - Fix scipy version #2293
- Fix the parameter passing during CI #2328
- Add
EmbeddingClsType
for using different embedding types #2318
π Thanks to our Contributors
This release contains contributions from @hanxiao @florian-hoenicke @alexcg1 @davidbp @Yongxuanzhang @bwanglzu @FionnD @Kelton8Z @nan-wang @JoanFM @cristianmtr @deepankarm @mohamed--abdel-maksoud @carlosb1 @winstonww
π Thanks to our Community
And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.
π€ Work with Jina
Want to work with Jina full-time? Check out our openings on our website.