github jina-ai/jina v1.0.0
🥂 Major Release v1.0

latest releases: v3.25.2, v3.25.1, v3.25.0...
3 years ago

We are excited to release Jina 1.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include:

  • Improve usability of CRUD operations by integrating with Flow APIs.
  • Make it easier to use Jina in a distributed way.
  • Support hyperparameter tuning out of the box by introducing FlowOptimizer.

Jina 1.0

⬆️ Major Features and Improvements

  • Improve usability of CRUD operations: Both REST and gRPC APIs of Flows provide native CRUD support. Add support for asyncio interface for CRUD. Check out how to use CRUD in README.md. To know more , please refer to docs.jina.ai #1461, #1650, #1810, #1835, #1841

  • Make it easier to use Jina in a distributed way: jinad is integrated into jina as the daemon module. Check out how to distribute your Flow remotely at README.md #1610, #1637

  • Introduce FlowOptimizer to run Flows with different parameter sets: This allows out-of-the-box hyperparameter tuning in Jina. Check out more details at docs.jina.ai #1459, #1726, #1751, #1776, #1764, #1789, #1800, #1851

⚠️ Breaking Changes

  • Refactor Segmenter to a separate Executor. #1632
Click to see example

v0.9.0 v1.0.0
from jina.executors.crafters import BaseSegmenter
class DummySentencizer(BaseSegmenter):
    def craft(self, text, *args, **kwargs):
        results = {}
        for t in text.split(','):
            results.append({'text': t})
        return results
return results

from jina.executors.segmenters import BaseSegmenter
class DummySentencizer(BaseSegmenter):
    def segment(self, text, *args, **kwargs):
        results = {}
        for t in text.split(','):
            results.append({'text': t})
        return results
return results

  • Remove jina/logging/sse.py. Correspondingly, '--logserver' and '--logserver-config' options are removed from CLI. Remove skip-on-error and replace with --on-error-strategy for CLI. To use log streaming, please refer to api.jina.ai #1580, #1757, #1766

  • Introduce request_size to distinguish number of Documents per request vs batch size of Documents processed by each Executor. #1677, #1746

Click to see example

v0.9.0 v1.0.0
with Flow().add() as f:
    f.index([
        Document(text='hello, jina') for _ in range(10)], 
        batch_size=3)

with Flow().add() as f:
    f.index([
         Document(text='hello, jina') for _ in range(10)], 
         request_size=3)

  • Remove separate_workspace. Instead of using pea_workspace, shard_workspace is defined based on workspace, name and pea_id. Set workspace metas default to None. Introduce root_workspace and root_name to resolve workspace issue in CompoundExecutor and ref_indexer. Use complete_path to find uses_internal resource to bind to docker when spinning up a pea in a container; Remove pea_workspace and replace current_workspace with shard_workspace #1722, #1739
Click to see example

v0.9.0 v1.0.0
f = Flow().add(uses='NumpyIndexer', shards=2, separated_workspace=True)

f = Flow().add(uses='NumpyIndexer', shards=2)

# saved files
# ├── jina.executors.indexers.vector.NumpyIndexer-86342edc-1
# │   ├── jina.executors.indexers.vector.NumpyIndexer-86342edc
# │   └── jina.executors.indexers.vector.NumpyIndexer-86342edc.bin
# └── jina.executors.indexers.vector.NumpyIndexer-c4f2ea6c-2
#     └── jina.executors.indexers.vector.NumpyIndexer-c4f2ea6c

Click to see example

v0.9.0 v1.0.0
f = (Flow()
     .add(uses='NumpyIndexer')
     .add(uses='MinRanker'))
with f:
    f.search([Document(embedding=np.random.rand(10))])

f = (Flow()
     .add(uses='NumpyIndexer')
     .add(uses='SimpleAggregateRanker', 
          aggregate_function='min',
          is_reversed_score=True))
with f:
    f.search([Document(embedding=np.random.rand(10))])

  • Deprecate output_fn, callback and buffer for Flow API #1730
  • Add support to use async generator in both inputs and outputs of AsyncFlow #1816
Click to see example

v0.9.0 v1.0.0
async def input_fn():	
    for _ in range(10):	
        yield Document()	
        await asyncio.sleep(0.1)	
with AsyncFlow().add() as f:	
    await f.index(input_fn)

from jina import AsyncFlow
async def input_fn():
    for _ in range(10):
        yield Document()
        await asyncio.sleep(0.1)
with AsyncFlow().add() as f:
    async for resp in f.index(input_fn):
        print(resp)

  • Refactor DeleteRequestProto in jina.proto; delete API of Flow accepts a sequence of Documents IDs instead of Documents; add docs and groundtruths fields in DeleteRequestProto; replace to_response method with as_response in Request class; replace to_json method with json in Request class; #1823
Click to see example

v0.9.0 v1.0.0
from jina import Flow
f = Flow().add(uses='_index')
with f:
    f.delete([Document(id=_id) for _id in ['🐦', '🐲']])

from jina import Flow
f = Flow().add(uses='_index')
with f:
    f.delete(['🐦', '🐲'])

  • Make VectorIndexer only use str as keys; Replace query_by_id with query_by_key #1829
Click to see example

v0.9.0 v1.0.0
vec_idx = array([0, 1])
with NumpyIndexer.load(save_abspath) as indexer:
    for idx in vec_idx:
        retrieved_vec = indexer.query_by_id(idx)

vec_idx = array(['0', '1'], dtype=(np.str_, 16))
with NumpyIndexer.load(save_abspath) as indexer:
    for key in vec_idx:
        retrieved_vec = indexer.query_by_key(ke)

  • Rename DocIDCache to DocCache #1831, #1898
  • Unify naming protobuf object to .proto; replace as_pb_object with proto for Message; replace proto in BaseNdArray, NdArray, SparseNdArray with _pb_body to avoid misunderstanding; replace _querylang with _pb_body in QueryLang; replace _request with _pb_body in Request; replace _score with _pb_body in NamedScore #1847
  • Remove groundtruths from UpdatedRequestProto #1858
  • Remove UniqueId #1872

📗 Documentation

🐞 Bug Fixes and Other Changes

Flow

  • Fix log streaming from remote for JinadRuntime #1584
  • Force set separated_workspace=False if parallel==1 #1682
  • Remove random identity assignment for Peas and HeadPea. #1755
  • Add new property workspace_id for BaseFlow #1761
  • Add new property identity for BaseFlow #1766
  • Improve arguments of index_lines and search_lines and make them consistent. #1778
  • Fix bug in terminating error Pods #1799
  • Add asyncio interface for input_fn #1808
  • Enable --show-exc-info by default #1818
  • Fix bug when two remote Pods connected with local Pods #1809
  • Use uuid for Zmqlet to avoid collision #1857
  • Fix bug in grpcRuntime #1865
  • Fix bug in deleting Peas when one fails to start #1863
  • Fix bug when timeout_ready is negative #1879
  • Add line_format argument for search_lines API to support both json and csv #1881
  • Fix bug that blocks Flow when some Pods failed to start #1902

Executors

  • Refactor Evaluator's name #1570
  • Fix loading logic in loading dependency for customized Executors with py_module #1597
  • Enable mapping volumes to arbitrary path in container for ContainerRuntime #1596
  • Fix loading indexers with ref_indexer when using ContainerRuntime #1595
  • Improve styling and docstrings for JinadRuntime #1599
  • Improve error logging for NoDriverForRequest #1624
  • Raise warnings when keys don't exist for BaseCache instead of exceptions #1628
  • Add WebSocketClient and AsyncWebSocketClient to stream requests. #1608
  • Fix get pea_id=-1 bug #1657
  • Remove _check_on_gpu to avoid false alarm when GPU device is available #1674 @tadejsv
  • Fix bug when shard is empty #1689
  • Pass arbitrary kwargs to Docker SDK. #1690
  • Add built-in resources _merge_matches_topk for merging matches and keeping only topk #1686, #1775
  • Filter out non-existing keys in BaseIndexer. #1694
  • Fix workspace bug when reloading Executor from Docker container. #1756
  • Fix __init__ signature of DocCache #1842
  • Fix issue with empty query handler #1844
  • Refactor BaseCache and BaseCacheDriver #1853
  • Change default value of is_merge in KVSearchDriver to False #1855, #1888
  • Refactor BaseIndexer, KeyValueIndexer and VectorIndexer #1873
  • Add BaseEmbeddingEvaluator and BaseTextEvaluator; remove BaseCraftingEvaluator and BaseEncodingEvaluator #1875

Drivers

  • Add default value for traversal_paths in KVSearchDriver #1685
  • Add DeleteDriver to extract Document IDs #1823
  • Change default traversal_paths of QueryLang Drivers from 'c' to 'r' #1859
  • Add warning when encoding return None in EncodeDriver #1886
  • Move checking of ID lengths to BaseExecutableDriver #1887
  • Refactor VectorSearchDriver and BaseCache #1878

Types

  • Fix dependency of get_content_hash() on chunks #1611 #1626
  • Introduce SearchRequest, TrainRequest, UpdateRequest, ControlRequest, DeleteRequest, IndexRequest as new types of requests. #1823
  • Add __str__ and __repr__ for primitive types #1847, #1852
  • Enable to build Document from dict or json #1877
  • Fix a bug in extend() of DocumentSet #1883
  • Add plot() in Document for visualization #1884

Tests

  • Refactor comparisons for None in unittest #1604
  • Add unittests for CosineEvaluator and EuclideanEvaluator #1603
  • Refactor test file structure. Set separate folders for tests in jinahub and jinad so they run in different CI stages. #1636
  • Refactoring CICD so unit and integration tests run in parallel to speed up #1633
  • Add tests for local Flow and remote Pods #1583
  • Fix topk in tests/integration/crud/test_crud.py. #1653
  • Add more integration tests for jinad. #1654, #1665
  • Move distributed tests to separate folder #1676
  • Add more tests on different topology structures to improve distributed tests #1668
  • Add multiple integration tests for CRUD operations. #1613, #1700, #1754, #1716
  • Add more distributed tests. #1727
  • Fix integration test on non-existing shards. #1719
  • Add tests for deleting chunks. #1620
  • Add tests for daemon in CI. #1770, #1771, #1773
  • Change logging level to SUCCESS in unit tests of test_logging.py. #1774
  • Add distributed tests for daemon. #1782
  • Add more unit tests for ContainerRuntime #1787
  • Adapt YAML configuration to v1. #1820
  • Add lookupnode tests. #1824
  • Add integration tests for running instances in parallel #1836
  • Remove */all API #1880
  • Fix tests in test_helloworld and test_flow #1897

HubIO

  • Remove warning for jina_version #1619
  • Fix bug when pushing images #1631
  • Refactor hubio to improve readability #1691
  • Adapt to latest changes of hubapi. #1721
  • Expose jina-version when running jina hub list #1890

Daemon

  • Fix docstrings and log information for jina/daemon redocs. Remove version for jina/daemon. Move log configuration of jina/daemon to jina/resources/logging.daemon.yml #1642
  • Add new API /*/parameters for jinad to fetch all parameters #1669
  • Fix CORS error #1687
  • Add more detailed error messages #1697
  • Refactor daemon codes. #1713, #1780
  • Fix early drop issue on log streaming #1738
  • Add new API /logs/{workspace_id}/{log_id} for log streaming. #1768
  • Fix bug when uploading files with pydantic. #1793
  • Fix bug that upload_files does not upload use_internal #1834
  • Fix bug of setting timeout_ready #1864

Others

  • Add commit-lint for CI #1627
  • Leverage show-exc-info as a common argument. #1639
  • Ignore parent id in content hashing. #1651
  • Clean up deprecated fields meta_info and CallUnary in jina.proto. #1655
  • Refactor code to avoid using builtins. #1658
  • Refactor code to remove unused definitions #1660
  • Make test a separate step in CI for jinad. #1664
  • Tag Docker images with GitHub run ID to improve readability of CI #1679
  • Avoid pushing test images to Docker registry #1680
  • Add armv6/v7 for building Docker images in CICD #1692
  • Add support for printing JSON in logs. #1693
  • Fix broken hello-world. #1710
  • Fix log level and upload path #1744
  • Improve CLI parser to optimize format and fix multiple help strings. #1747
  • Add --compress, --compress-min-bytes, --compress-min-ratio for CLI to enable message compression #1766
  • Replace " with single quotes in CLI help strings. #1790
  • Add Python 3.9 support #1802
  • Add identity property for Pea and add a warning when JINA_FULL_CLI is set in environment #1804
  • Add warnings when invalid arguments are given to Executors. #1817
  • Enable usage of customized long ids. #1666 @janandreschweiger
  • Enable Document IDs to be arbitrary strings. #1837 #1840
  • Refactor typing to use Iterable instead of Iterator #1843
  • Fix YAML path check #1850
  • Update copyright information #1854
  • Align versioning of Jina Dashboard with Jina #1870, #1892
  • Increase default value for timeout-ready and update autocompletion #1893
  • Add hello-world-chatbot demo #1894

🙏 Thanks to our Contributors

This release contains contributions from @bwanglzu @florian-hoenicke @JoanFM @PabloRN @hanxiao @deepankarm @nan-wang @NouiliKh @cristianmtr @FionnD @maximilianwerk @bhavsarpratik @tadejsv @Yongxuanzhang @ThePfarrer @theUnkownName @DimitrisPr @Immich @davidbp @janandreschweiger @BingHo1013 @slettner @imsergiy @Roshanjossey @BastinJafari @rutujasurve94 @lusloher @aga11313 @alexcg1 @pswu11

🙏 Thanks to our Community

And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.

Jina 1.x Features (Target release: monthly, 2021)

Jina will continuously improve scalability performance and make Jina reliable in production environments. 1.x will also support replicas and query-while-indexing.

🤝 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.

Release Note (1.0.0)

Release time: 2021-02-10 09:24:16

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, cristian, Maximilian Werk, Yongxuanzhang, Bing @jina AI, Jina Dev Bot, CatStark, Joan Fontanals, Deepankar Mahapatro, 🙇

🐞 Bug fixes

  • [e0e1c13c] - more useful default (#1917) (Maximilian Werk)
  • [573d67ba] - resources: fix yaml config in resources (#1913) (Han Xiao)

📗 Documentation

🏁 Unit Test and CICD

  • [70b9bc97] - fix test in hello-world (Han Xiao)

🍹 Other Improvements

  • [2a0c4e1d] - bump version to 1.0.0 (Han Xiao)
  • [dca0ffe2] - update slogan (#1921) (Bing @jina AI)
  • [7b72120a] - docs: update TOC (Jina Dev Bot)
  • [c5062fa5] - contributor: update contributors (Jina Dev Bot)
  • [e42ce89e] - style: reformatted by jina-dev-bot (Jina Dev Bot)
  • [935d9a10] - version: the next version will be 0.9.34 (Jina Dev Bot)

Don't miss a new jina release

NewReleases is sending notifications on new releases.