jina-ai/serve v3.14.0 on GitHub

Release Note (`3.14.0`)

Release time: 2023-02-20 09:15:47

This release contains 11 new features, 6 refactors, 12 bug fixes and 10 documentation improvements.

🆕 Features

Reshaping Executors as standalone services with the Deployment layer (#5563, #5590, #5628, #5672 and #5673)

In this release we aim to unlock more use cases, mainly building highly performant and scalable services. With its built-in layers of abstraction, Jina lets users build scalable, containerized, cloud-native components which we call Executors. Executors have always been services, but they were mostly used in Flows to form pipelines.

Now you can deploy an Executor on its own, without needing a Flow. Whether it's for model inference, prediction, embedding, generation or search, an Executor can wrap your business logic, and you get a gRPC microservice with Jina's cloud-native features (shards, replicas, dynamic batching, etc.)

To do this we offer the Deployment layer to deploy an Executor. Just like a Flow groups and orchestrate many Executors, a Deployment orchestrates just one Executor.

A Deployment can be used with both the Python API and YAML. For instance, after you define an Executor, use the Deployment class to serve it:

from jina import Deployment

with Deployment(uses=MyExecutor, port=12345, replicas=2) as dep:
    dep.block() # serve forever

─────────────────────── 🎉 Deployment is ready to serve! ───────────────────────
╭────────────── 🔗 Endpoint ────────────────╮
│  ⛓     Protocol                    GRPC  │
│  🏠       Local           0.0.0.0:12345   │
│  🔒     Private     192.168.3.147:12345   │
│  🌍      Public    87.191.159.105:12345   │
╰───────────────────────────────────────────╯

Or implement a Deployment in YAML and run it from the CLI:

jtype: Deployment
with:
  port: 12345
  replicas: 2
  uses: MyExecutor
  py_modules:
    - my_executor.py

jina deployment --uses deployment.yml

The Deployment class offers the same interface as a Flow, so it can be used as a client too:

from jina import Deployment

with Deployment(uses=MyExecutor, port=12345, replicas=2) as dep:
    dep.post(on='/foo', inputs=DocumentArray.empty(1)
    print(docs.texts)

Furthermore, you can use the Deployment to create Kubernetes and Docker Compose YAML configurations of a single Executor deployment. So, to export to Kubernetes with the Python API:

from jina import Deployment

dep = Deployment(uses='jinaai+docker://jina-ai/DummyHubExecutor', port_expose=8080, replicas=3)
dep.to_kubernetes_yaml('/tmp/config_out_folder', k8s_namespace='my-namespace')

And exporting to Kubernetes with the CLI is just as straightforward:

jina export kubernetes deployment.yml output_path

As is exporting to Docker Compose with the Python API:

from jina import Deployment

dep = Deployment(uses='jinaai+docker://jina-ai/DummyHubExecutor', port_expose=8080, replicas=3)

dep.to_docker_compose_yaml(
    output_path='/tmp/docker-compose.yml',
)

And of course, you can also export to Docker Compose with the CLI:

jina export docker-compose deployment.yml output_path

(Beta) Support DocArray v2 (#5603)

As the DocArray refactoring is shaping up nicely, we've decided to integrate initial support. Although this support is still experimental, we believe DocArray v2 offers nice abstractions to clearly define the data of your services, especially with the single Executor deployment that we introduce in this release.

With this new experimental feature, you can define your input and output schemas with DocArray v2 and use type hints to define schemas of each endpoint:

from jina import Executor, requests
from docarray import BaseDocument, DocumentArray
from docarray.typing import AnyTensor, ImageUrl

class InputDoc(BaseDocument):
    img: ImageUrl

class OutputDoc(BaseDocument):
    embedding: AnyTensor

class MyExec(Executor):
    @requests(on='/bar')
    def bar(
        self, docs: DocumentArray[InputDoc], **kwargs
    ) -> DocumentArray[OutputDoc]:
        return_docs = DocumentArray[OutputDoc](
            [OutputDoc(embedding=embed(doc.img)) for doc in docs]
        )
        return return_docs

Read more about the integration in the DocArray v2 section of our docs.

Communicate with individual Executors in Custom Gateways (#5558)

Custom Gateways can now make separate calls to specific Executors without respecting the Flow's topology.

With this feature, we target a different set of use cases, where the task does not necessarily have to be defined by a DAG pipeline. Rather, you define processing order using explicit calls to Executors and implement any use case where there's a central service (Gateway) communicating with remote services (Executors).

For instance, you can implement a Gateway like so:

from jina.serve.runtimes.gateway.http.fastapi import FastAPIBaseGateway
from jina import Document, DocumentArray, Flow, Executor, requests
from fastapi import FastAPI

class MyGateway(FastAPIBaseGateway):
    @property
    def app(self):
        app = FastAPI()

        @app.get("/endpoint")
        async def get(text: str):
            doc1 = await self.executor['executor1'].post(on='/', inputs=DocumentArray([Document(text=text)]))
            doc2 = await self.executor['executor2'].post(on='/', inputs=DocumentArray([Document(text=text)]))
            return {'result': doc1.texts + doc2.texts}

        return app

# Add the Gateway and Executors to a Flow
flow = Flow() \
    .config_gateway(uses=MyGateway, protocol='http', port=12345) \
    .add(uses=FirstExec, name='executor1') \
    .add(uses=SecondExec, name='executor2')

Read more about calling individual Executors.

Add secrets to Jina on Kubernetes (#5557)

To support building secure apps, we've added support for secrets on Kubernetes in Jina. Mainly, you can create environment variables whose sources are Kubernetes Secrets.

Add the secret using the env_from_secret parameter either in Python API or YAML:

from jina import Flow

f = (
    Flow().add(
        uses='jinaai+docker://jina-ai/DummyHubExecutor',
        env_from_secret={
            'SECRET_USERNAME': {'name': 'mysecret', 'key': 'username'},
            'SECRET_PASSWORD': {'name': 'mysecret', 'key': 'password'},
        },
    )
)

f.to_kubernetes_yaml('./k8s_flow', k8s_namespace='custom-namespace')

Add `GatewayStreamer.stream()` to yield response and Executor errors (#5650)

If you're implementing a custom Gateway, you can use the GatewayStreamer.stream() method to catch errors raised in Executors. Catching such errors wasn't possible with the GatewayStreamer.stream_docs() method.

async for docs, error in self.streamer.stream(
    docs=my_da,
    exec_endpoint='/',
):
    if error:
        # raise error
    else:
        # process results

Add argument `suppress_root_logging` to remove or preserve root logging handlers (#5635)

In this release, we've added the argument suppress_root_logging to (you guessed it) suppress root logger messages. By default, root logs are suppressed.

Kudos to our community member @Jake-00 for the contribution!

Add gRPC streaming endpoint to Worker and Head runtimes (#5614)

To empower Executors we've added a gRPC streaming endpoint to both the Worker and Head runtimes. This means that an Executor or Head gRPC server exposes the same interface as a Jina gRPC Gateway. Therefore, you can use Jina's gRPC Client with each of those entities.

Add `prefetch` argument to client post method (#5607)

A prefetch argument has been added to the Client.post() method. Previously, this argument was only available to the Gateway and it controlled how many requests a Gateway could send to Executors at a time.

However, it was not possible to control how many requests a Gateway (or Executor in case of a single Executor Deployment) could receive at a time.

Therefore, we've added the argument to the Client.post() to give you better control over your requests.

Run warmup on runtimes and Executor (#5579)

On startup, all Jina entities that hold gRPC connections and stubs to other entities (Head and Gateway) now start warming up before the services become ready. This ensures lower latencies on first requests submitted to the Flow.

Make gRPC Client thread safe (#5533)

Previously, as gRPC asyncio clients offer limited support for multi-threading, using the Jina gRPC Client in multiple threads would print errors.

Therefore, in this release, we make the gRPC Client thread-safe in the sense that a thread can re-use it multiple times without another thread using it simultaneously. This means you can use the gRPC Client with multi-threading, while being sure only asyncio tasks belonging to the same thread have access to the gRPC stub at the same time.

Add user survey (#5667)

When running a Flow, a message now shows up in the terminal with a survey link. Feel free to fill in our survey to help us improve Jina. your feedback is much appreciated!

⚙ Refactoring

Use single Gateway streamer for multiprotocol Gateway (#5598)

When we released multiprotocol Gateways, the implementation relied on exposing a separate gRPC connection and stubs protocol. As this turned out to be unnecessary, this release re-uses the same connections and stubs.

Remove manual deletion of channel resources (#5633)

This release refactors how we handle channel resources. Mainly, deletion of channel resources is no longer handled manually and is left to the garbage collector.

No need to run summary in thread (#5632)

Getting and printing Flow summary information is no longer executed in a separate thread, and is now handled in the main thread.

Refactor GRPCConnectionPool implementation into a package (#5623)

All gRPC connection pool logic has been refactored into a separate package, rather than having a GRPCConnectionPool class.

Remove reversing request order (#5580)

Reversing request order has been removed from runtime logic.

Simplify `get_docs_from_request` helper function (#5567)

The get_docs_from_request helper function in the request handler module has been simplified and no longer accepts unneeded parameters.

🐞 Bug Fixes

Relax protobuf version (#5591)

To better support different environments, we've relaxed Jina's protobuf version so it no longer conflicts with Google Colab's pre-installed version (which may result in breaking some installed dependencies).

Fix loading Gateway arguments from YAML (#5664 and #5678)

Prior to this release, loading Gateway configurations from YAML had a few bugs. Mainly, some parameters were not passed correctly to the Gateway runtime when configs were loaded. Also, other default runtime Gateway arguments would always override arguments from YAML configs.

Fix usage with `cuda_visible_devices` (#5654)

Prior to this release, using replicas on multiple GPUs would fail if the CUDA_VISIBLE_DEVICES environment variable was passed with the env parameter, rather than actually being set in the environment variables.

Properly use logging configuration in Executor, Gateway and Client (#5638)

This release unifies the logging configuration in Executors, Gateways and Clients and exposes configuration parameters properly. You can now pass log configuration to your Clients and Flows and expect consistent logging behavior:

from jina import Client
client = Client(log_config='./logging.json.yml')

# or with a Flow object

from jina import Flow
f = Flow(log_config='./logging.json.yml')
with f:
    # the implicit client automatically uses the log_config from the Flow for consistency
    f.post('/')

Pass extra arguments to the Gateway runtime in case of containerized Gateways (#5631)

Prior to this release, if a containerized Custom Gateway was started, Jina wouldn't pass some arguments to the container entrypoint. This could break some behaviour, for instance, the runtime not knowing which port to use for serving the Gateway. The issue is fixed in this release.

Clean up OpenTelemetry resources in the Flow context manager exit procedure (#5619)

This release adds proper clean up to the OpenTelemetry resources. The clean up logic has been added to the Client class and is called automatically in the Flow's context manager.

If you're using the Client with OpenTelemetry enabled, call client.teardown_instrumentation() to have correct spans of the client.

Improve error messages for gRPC `NOT_Found` errors (#5617)

When an external Executor/Flow is behind an API Gateway (which is the case for JCloud), but is down, then DNS resolution succeeds but a "resource" (the Executor/Flow) cannot be found, resulting in a gRPC error with NOT_FOUND code.

This error case wasn't properly handled before, giving output like the following:

This output did not include information about which part of the Flow failed.

After this release, the affected deployment and its address is displayed:

ERROR  gateway/rep-0/GatewayRuntime@123711 Error while       [01/23/23 12:35:15]
       getting responses from deployments: no Route matched                     
       with those values                                                        
       trailing_metadata=Metadata((('date', 'Mon, 23 Jan                        
       2023 11:35:15 GMT'), ('content-length', '0'),                            
       ('x-kong-response-latency', '0'), ('server',                             
       'kong/3.0.2')))                                                          
       trailing_metadata=Metadata((('date', 'Mon, 23 Jan                        
       2023 11:35:15 GMT'), ('content-length', '0'),                            
       ('x-kong-response-latency', '0'), ('server',                             
       'kong/3.0.2')))                                                          
       |Gateway: Connection error with deployment                               
       `executor0` at address(es) {'blah.wolf.jina.ai'}.                        
       Connection with {'blah.wolf.jina.ai'} succeeded, but                     
       `executor0` was not found. Possibly `executor0` is                       
       behind an API gateway but not reachable.                                 
       trailing_metadata=Metadata((('date', 'Mon, 23 Jan                        
       2023 11:35:15 GMT'), ('content-length', '0'),                            
       ('x-kong-response-latency', '0'), ('server',                             
       'kong/3.0.2')))

Use `mixin_hub_pull_options_parser` from Hubble (#5586)

Some Hub parameters were implemented in both Jina and jina-hubble-sdk. This meant that if jina-hubble-sdk updated some parameters, there would be a mismatch and potentially bugs. This release removes these parameters from Jina to rely entirely on jina-hubble-sdk for Hubble-specific parameters.

Disable timeout for Liveness Probe in Kubernetes and keep only Kubernetes timeout (#5594)

When an Executor is deployed to Kubernetes, a Kubernetes Liveness Probe is configured. The liveness probe uses the jina ping command under the hood to check the Executor health. However, this health-check is subject to the Kubernetes Liveness Probe timeout as well as the jina ping command timeout. This release removes (actually relaxes) the jina ping command timeout to keep only one configurable timeout (it respects timeout_ready) so that you can deploy Executors that are slow to load.

Enable timeout for pinging Executor and Gateway (#5600)

The jina ping CLI can submit ping requests to Gateways and Executors. However, this command previously accepted a timeout parameter that was not respected. This release fixes this behavior and specifying the timeout parameter now makes the command fail if the ping requests are not successful after the timeout is exceeded.

Edit terminal profile file even if it does not exist (#5597)

When Jina is installed with pip, the installation script attempts to configure the user's terminal profile (.bashrc, .zshrc, .fish files) to add configuration needed for the jina command. However, this would be ignored if a user's terminal profile didn't exist.

With this release, the installation script now identifies the required terminal profile file depending on the user's environment and writes a new one if it does not exist already.

Multi-protocol gateway supports monitoring (#5570)

Prior to this release, using multiple protocols in the Gateway along with monitoring would raise an error. In this release, there are no issues when using multiple protocols along with monitoring in your Gateway.

📗 Documentation improvements

Document tracing support in jcloud (#5688)
Add survey banner (#5649)
Refactor example code for experimenting with OpenTelemetry (#5656)
Document arm64 architecture support in jina push command (#5644)
Add jcloud Executor availability parameters (#5624)
Use one GPU in Jcloud deployment example (#5620)
Add caution about exceptions inside Flow context (#5615)
Document Flow update, restart, pause and resume on jcloud (#5577)
Document ephemeral storage type in JCloud (#5583)
Document Executor data retention with retain parameter in JCloud (#5572)

🤟 Contributors

We would like to thank all contributors to this release:

Yanlong Wang (@nomagick)
tarrantro (@tarrantro)
samsja (@samsja)
Alaeddine Abdessalem (@alaeddine-13)
Girish Chandrashekar (@girishc13)
Subba Reddy Veeramreddy (@subbuv26)
Alaeddine Abdessalem (@nan-wang)
Alex Cureton-Griffiths (@alexcg1)
Jake-00 (@Jake-00)
Anne Yang (@AnneYang720)
Joan Fontanals (@JoanFM)
Nikolas Pitsillos (@npitsillos)
Johannes Messner (@JohannesMessner)
Jackmin801 (@Jackmin801)

jina-ai/serve v3.14.0 💫 Release v3.14.0 on GitHub

Release Note (3.14.0)