We are excited to release Jina 1.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include:
- Improve usability of CRUD operations by integrating with Flow APIs.
- Make it easier to use Jina in a distributed way.
- Support hyperparameter tuning out of the box by introducing
FlowOptimizer
.
Jina 1.0
⬆️ Major Features and Improvements
-
Improve usability of CRUD operations: Both REST and gRPC APIs of Flows provide native CRUD support. Add support for
asyncio
interface for CRUD. Check out how to use CRUD in README.md. To know more , please refer to docs.jina.ai #1461, #1650, #1810, #1835, #1841 -
Make it easier to use Jina in a distributed way:
jinad
is integrated intojina
as thedaemon
module. Check out how to distribute your Flow remotely at README.md #1610, #1637 -
Introduce
FlowOptimizer
to run Flows with different parameter sets: This allows out-of-the-box hyperparameter tuning in Jina. Check out more details at docs.jina.ai #1459, #1726, #1751, #1776, #1764, #1789, #1800, #1851
⚠️ Breaking Changes
- Refactor
Segmenter
to a separate Executor. #1632
Click to see example
v0.9.0 | v1.0.0 | |
from jina.executors.crafters import BaseSegmenter
class DummySentencizer(BaseSegmenter):
def craft(self, text, *args, **kwargs):
results = {}
for t in text.split(','):
results.append({'text': t})
return results
return results
|
from jina.executors.segmenters import BaseSegmenter
class DummySentencizer(BaseSegmenter):
def segment(self, text, *args, **kwargs):
results = {}
for t in text.split(','):
results.append({'text': t})
return results
return results
|
-
Remove
jina/logging/sse.py
. Correspondingly, '--logserver' and '--logserver-config' options are removed from CLI. Removeskip-on-error
and replace with--on-error-strategy
for CLI. To use log streaming, please refer to api.jina.ai #1580, #1757, #1766 -
Introduce
request_size
to distinguish number of Documents per request vs batch size of Documents processed by each Executor. #1677, #1746
Click to see example
v0.9.0 | v1.0.0 | |
with Flow().add() as f:
f.index([
Document(text='hello, jina') for _ in range(10)],
batch_size=3)
|
with Flow().add() as f:
f.index([
Document(text='hello, jina') for _ in range(10)],
request_size=3)
|
- Remove
separate_workspace
. Instead of usingpea_workspace
,shard_workspace
is defined based onworkspace
,name
andpea_id
. Setworkspace
metas default toNone
. Introduceroot_workspace
androot_name
to resolve workspace issue inCompoundExecutor
andref_indexer
. Usecomplete_path
to finduses_internal
resource tobind
todocker
when spinning up apea
in acontainer
; Removepea_workspace
and replacecurrent_workspace
withshard_workspace
#1722, #1739
Click to see example
v0.9.0 | v1.0.0 | |
f = Flow().add(uses='NumpyIndexer', shards=2, separated_workspace=True)
|
f = Flow().add(uses='NumpyIndexer', shards=2)
# saved files
# ├── jina.executors.indexers.vector.NumpyIndexer-86342edc-1
# │ ├── jina.executors.indexers.vector.NumpyIndexer-86342edc
# │ └── jina.executors.indexers.vector.NumpyIndexer-86342edc.bin
# └── jina.executors.indexers.vector.NumpyIndexer-c4f2ea6c-2
# └── jina.executors.indexers.vector.NumpyIndexer-c4f2ea6c
|
- Remove
MinRanker
#1729
Click to see example
v0.9.0 | v1.0.0 | |
f = (Flow()
.add(uses='NumpyIndexer')
.add(uses='MinRanker'))
with f:
f.search([Document(embedding=np.random.rand(10))])
|
f = (Flow()
.add(uses='NumpyIndexer')
.add(uses='SimpleAggregateRanker',
aggregate_function='min',
is_reversed_score=True))
with f:
f.search([Document(embedding=np.random.rand(10))])
|
- Deprecate
output_fn
,callback
andbuffer
for Flow API #1730 - Add support to use async generator in both inputs and outputs of
AsyncFlow
#1816
Click to see example
v0.9.0 | v1.0.0 | |
async def input_fn():
for _ in range(10):
yield Document()
await asyncio.sleep(0.1)
with AsyncFlow().add() as f:
await f.index(input_fn)
|
from jina import AsyncFlow
async def input_fn():
for _ in range(10):
yield Document()
await asyncio.sleep(0.1)
with AsyncFlow().add() as f:
async for resp in f.index(input_fn):
print(resp)
|
- Refactor
DeleteRequestProto
injina.proto
;delete
API of Flow accepts a sequence of Documents IDs instead of Documents; adddocs
andgroundtruths
fields inDeleteRequestProto
; replaceto_response
method withas_response
inRequest
class; replaceto_json
method withjson
inRequest
class; #1823
Click to see example
v0.9.0 | v1.0.0 | |
from jina import Flow
f = Flow().add(uses='_index')
with f:
f.delete([Document(id=_id) for _id in ['🐦', '🐲']])
|
from jina import Flow
f = Flow().add(uses='_index')
with f:
f.delete(['🐦', '🐲'])
|
- Make
VectorIndexer
only usestr
as keys; Replacequery_by_id
withquery_by_key
#1829
Click to see example
v0.9.0 | v1.0.0 | |
vec_idx = array([0, 1])
with NumpyIndexer.load(save_abspath) as indexer:
for idx in vec_idx:
retrieved_vec = indexer.query_by_id(idx)
|
vec_idx = array(['0', '1'], dtype=(np.str_, 16))
with NumpyIndexer.load(save_abspath) as indexer:
for key in vec_idx:
retrieved_vec = indexer.query_by_key(ke)
|
- Rename
DocIDCache
toDocCache
#1831, #1898 - Unify naming protobuf object to
.proto
; replaceas_pb_object
withproto
forMessage
; replaceproto
inBaseNdArray
,NdArray
,SparseNdArray
with_pb_body
to avoid misunderstanding; replace_querylang
with_pb_body
inQueryLang
; replace_request
with_pb_body
inRequest
; replace_score
with_pb_body
inNamedScore
#1847 - Remove
groundtruths
fromUpdatedRequestProto
#1858 - Remove
UniqueId
#1872
📗 Documentation
- Add reference check while building docs #1600
- Add links to Spanish README #1591 @PabloRN
- Fix typos in German README #1602
- Fix broken links in README and CONTRIBUTING #1594 @NouiliKh
- Fix broken links at docs.jina.ai #1614
- Fix missing import in README #1622
- Refactor examples section in README. #1652
- Add links to Binder in README. #1673
- Fix typo in README #1723, #1745
- Add Docker example for Flow API
uses
#1741 - Fix links to remote usage of Jina in README #1740 @ThePfarrer
- Add Greek translation for 101 #1765 @DimitrisPr
- Add Spanish translation for 101 #1794 @Immich
- Move documentation repo to https://github.com/jina-ai/docs. #1819, #1828, #1833
- Add [documentation for REST APIs](https://api.jina.ai/rest/index.html](https://api.jina.ai/rest/index.html) #1846 #1849 #1856 #1860
- Improve docstrings #1749, #1867, #1869, #1889, #1874, #1876, #1889, #1895, #1896, #1899, #1900, #1901, #1903, #1904, #1906, #1907, #1908, #1909, #1910, #1912, #1913, #1914, #1915, #1918, #1919, #1920,
- Fix typo in README #1882 @slettner
- Update the slogan in README #1921
🐞 Bug Fixes and Other Changes
Flow
- Fix log streaming from remote for
JinadRuntime
#1584 - Force set
separated_workspace=False
ifparallel==1
#1682 - Remove random identity assignment for Peas and HeadPea. #1755
- Add new property
workspace_id
forBaseFlow
#1761 - Add new property
identity
forBaseFlow
#1766 - Improve arguments of
index_lines
andsearch_lines
and make them consistent. #1778 - Fix bug in terminating error Pods #1799
- Add
asyncio
interface forinput_fn
#1808 - Enable
--show-exc-info
by default #1818 - Fix bug when two remote Pods connected with local Pods #1809
- Use
uuid
forZmqlet
to avoid collision #1857 - Fix bug in
grpcRuntime
#1865 - Fix bug in deleting Peas when one fails to start #1863
- Fix bug when
timeout_ready
is negative #1879 - Add
line_format
argument forsearch_lines
API to support bothjson
andcsv
#1881 - Fix bug that blocks Flow when some Pods failed to start #1902
Executors
- Refactor Evaluator's name #1570
- Fix loading logic in loading dependency for customized Executors with
py_module
#1597 - Enable mapping volumes to arbitrary path in container for
ContainerRuntime
#1596 - Fix loading indexers with
ref_indexer
when usingContainerRuntime
#1595 - Improve styling and docstrings for
JinadRuntime
#1599 - Improve error logging for
NoDriverForRequest
#1624 - Raise warnings when keys don't exist for
BaseCache
instead of exceptions #1628 - Add
WebSocketClient
andAsyncWebSocketClient
to stream requests. #1608 - Fix get
pea_id=-1
bug #1657 - Remove
_check_on_gpu
to avoid false alarm when GPU device is available #1674 @tadejsv - Fix bug when shard is empty #1689
- Pass arbitrary
kwargs
to Docker SDK. #1690 - Add built-in resources
_merge_matches_topk
for merging matches and keeping onlytopk
#1686, #1775 - Filter out non-existing keys in
BaseIndexer
. #1694 - Fix workspace bug when reloading Executor from Docker container. #1756
- Fix
__init__
signature ofDocCache
#1842 - Fix issue with empty query handler #1844
- Refactor
BaseCache
andBaseCacheDriver
#1853 - Change default value of
is_merge
inKVSearchDriver
toFalse
#1855, #1888 - Refactor
BaseIndexer
,KeyValueIndexer
andVectorIndexer
#1873 - Add
BaseEmbeddingEvaluator
andBaseTextEvaluator
; removeBaseCraftingEvaluator
andBaseEncodingEvaluator
#1875
Drivers
- Add default value for
traversal_paths
inKVSearchDriver
#1685 - Add
DeleteDriver
to extract Document IDs #1823 - Change default
traversal_paths
of QueryLang Drivers from'c'
to'r'
#1859 - Add warning when encoding return
None
inEncodeDriver
#1886 - Move checking of ID lengths to
BaseExecutableDriver
#1887 - Refactor
VectorSearchDriver
andBaseCache
#1878
Types
- Fix dependency of
get_content_hash()
on chunks #1611 #1626 - Introduce
SearchRequest
,TrainRequest
,UpdateRequest
,ControlRequest
,DeleteRequest
,IndexRequest
as new types of requests. #1823 - Add
__str__
and__repr__
for primitive types #1847, #1852 - Enable to build Document from
dict
orjson
#1877 - Fix a bug in
extend()
ofDocumentSet
#1883 - Add
plot()
in Document for visualization #1884
Tests
- Refactor comparisons for
None
in unittest #1604 - Add unittests for
CosineEvaluator
andEuclideanEvaluator
#1603 - Refactor test file structure. Set separate folders for tests in
jinahub
andjinad
so they run in different CI stages. #1636 - Refactoring CICD so
unit
andintegration
tests run in parallel to speed up #1633 - Add tests for local Flow and remote Pods #1583
- Fix
topk
intests/integration/crud/test_crud.py
. #1653 - Add more integration tests for
jinad
. #1654, #1665 - Move distributed tests to separate folder #1676
- Add more tests on different topology structures to improve distributed tests #1668
- Add multiple integration tests for CRUD operations. #1613, #1700, #1754, #1716
- Add more distributed tests. #1727
- Fix integration test on non-existing shards. #1719
- Add tests for deleting chunks. #1620
- Add tests for daemon in CI. #1770, #1771, #1773
- Change logging level to
SUCCESS
in unit tests oftest_logging.py
. #1774 - Add distributed tests for daemon. #1782
- Add more unit tests for
ContainerRuntime
#1787 - Adapt YAML configuration to
v1
. #1820 - Add
lookupnode
tests. #1824 - Add integration tests for running instances in parallel #1836
- Remove
*/all
API #1880 - Fix tests in
test_helloworld
andtest_flow
#1897
HubIO
- Remove warning for
jina_version
#1619 - Fix bug when pushing images #1631
- Refactor
hubio
to improve readability #1691 - Adapt to latest changes of hubapi. #1721
- Expose
jina-version
when runningjina hub list
#1890
Daemon
- Fix docstrings and log information for
jina/daemon
redocs. Removeversion
forjina/daemon
. Move log configuration ofjina/daemon
tojina/resources/logging.daemon.yml
#1642 - Add new API
/*/parameters
forjinad
to fetch all parameters #1669 - Fix CORS error #1687
- Add more detailed error messages #1697
- Refactor daemon codes. #1713, #1780
- Fix early drop issue on log streaming #1738
- Add new API
/logs/{workspace_id}/{log_id}
for log streaming. #1768 - Fix bug when uploading files with
pydantic
. #1793 - Fix bug that
upload_files
does not uploaduse_internal
#1834 - Fix bug of setting
timeout_ready
#1864
Others
- Add
commit-lint
for CI #1627 - Leverage
show-exc-info
as a common argument. #1639 - Ignore parent id in content hashing. #1651
- Clean up deprecated fields
meta_info
andCallUnary
injina.proto
. #1655 - Refactor code to avoid using
builtins
. #1658 - Refactor code to remove unused definitions #1660
- Make test a separate step in CI for
jinad
. #1664 - Tag Docker images with GitHub run ID to improve readability of CI #1679
- Avoid pushing test images to Docker registry #1680
- Add
armv6/v7
for building Docker images in CICD #1692 - Add support for printing JSON in logs. #1693
- Fix broken
hello-world
. #1710 - Fix log level and upload path #1744
- Improve CLI parser to optimize format and fix multiple help strings. #1747
- Add
--compress
,--compress-min-bytes
,--compress-min-ratio
for CLI to enable message compression #1766 - Replace
"
with single quotes in CLI help strings. #1790 - Add Python 3.9 support #1802
- Add
identity
property for Pea and add a warning whenJINA_FULL_CLI
is set in environment #1804 - Add warnings when invalid arguments are given to Executors. #1817
- Enable usage of customized long ids. #1666 @janandreschweiger
- Enable Document IDs to be arbitrary strings. #1837 #1840
- Refactor typing to use
Iterable
instead ofIterator
#1843 - Fix YAML path check #1850
- Update copyright information #1854
- Align versioning of Jina Dashboard with Jina #1870, #1892
- Increase default value for
timeout-ready
and update autocompletion #1893 - Add
hello-world-chatbot
demo #1894
🙏 Thanks to our Contributors
This release contains contributions from @bwanglzu @florian-hoenicke @JoanFM @PabloRN @hanxiao @deepankarm @nan-wang @NouiliKh @cristianmtr @FionnD @maximilianwerk @bhavsarpratik @tadejsv @Yongxuanzhang @ThePfarrer @theUnkownName @DimitrisPr @Immich @davidbp @janandreschweiger @BingHo1013 @slettner @imsergiy @Roshanjossey @BastinJafari @rutujasurve94 @lusloher @aga11313 @alexcg1 @pswu11
🙏 Thanks to our Community
And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.
Jina 1.x Features (Target release: monthly, 2021)
Jina will continuously improve scalability performance and make Jina reliable in production environments. 1.x will also support replicas and query-while-indexing.
🤝 Work with Jina
Want to work with Jina full-time? Check out our openings on our website.
Release Note (1.0.0
)
Release time: 2021-02-10 09:24:16
🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, cristian, Maximilian Werk, Yongxuanzhang, Bing @jina AI, Jina Dev Bot, CatStark, Joan Fontanals, Deepankar Mahapatro, 🙇
🐞 Bug fixes
- [
e0e1c13c
] - more useful default (#1917) (Maximilian Werk) - [
573d67ba
] - resources: fix yaml config in resources (#1913) (Han Xiao)
📗 Documentation
- [
fb3d1db2
] - docstrings jaml (#1915) (cristian) - [
5fb7ea8b
] - ndarray: update docstring for ndarray (#1920) (Yongxuanzhang) - [
768c9a8b
] - types: update docstring for request (#1919) (Yongxuanzhang) - [
406e8974
] - importer: update importer docstring (#1918) (Yongxuanzhang) - [
3d4fd0e0
] - docstrings for 'clients' (#1912) (cristian) - [
30ada30a
] - improve docstrings on drivers (#1914) (cristian) - [
f6a856f2
] - helper: update helper docstrings (#1895) (Yongxuanzhang) - [
3fae1ec0
] - docstring executors (#1904) (CatStark) - [
ce11961e
] - docstring sets (#1899) (CatStark) - [
fe7222b1
] - improve docstrings for some drivers (#1869) (Joan Fontanals) - [
daf7b3db
] - rest: add jina logo to redoc (#1908) (Deepankar Mahapatro) - [
a60135f2
] - docstrings for hub module (#1910) (cristian)
🏁 Unit Test and CICD
- [
70b9bc97
] - fix test in hello-world (Han Xiao)
🍹 Other Improvements
- [
2a0c4e1d
] - bump version to 1.0.0 (Han Xiao) - [
dca0ffe2
] - update slogan (#1921) (Bing @jina AI) - [
7b72120a
] - docs: update TOC (Jina Dev Bot) - [
c5062fa5
] - contributor: update contributors (Jina Dev Bot) - [
e42ce89e
] - style: reformatted by jina-dev-bot (Jina Dev Bot) - [
935d9a10
] - version: the next version will be 0.9.34 (Jina Dev Bot)