Dapr 1.16.12

This update includes bug fixes:

Security: Fixes gRPC authorization bypass - CVE-2026-33186
Pulsar pub/sub JSON and Avro schema validation fixes
Scheduler cluster stalls for minutes after pod restart when workflows are running

Security: gRPC authorization bypass

Problem

An upstream dependency (google.golang.org/grpc) used by Dapr introduced a vulnerability that could allow gRPC authorization bypass under certain conditions (CVE-2026-33186).

Impact

Users running affected versions could be exposed to unauthorized gRPC requests.

Root Cause

The issue originated in an upstream library.

Solution

This release upgrades the affected dependency to a version that resolves CVE-2026-33186.

Users are strongly encouraged to upgrade to this release.

Pulsar pub/sub JSON and Avro schema validation fixes

Problem

The Pulsar pub/sub component had several issues with JSON and Avro schema handling:

Avro subscribe path broken: When a Pulsar topic has an Avro schema configured, messages are delivered to the subscriber as raw Avro binary bytes. The Dapr runtime attempts json.Unmarshal on those bytes and fails with:
```
error deserializing cloud event in pubsub <component> and topic <topic>:
invalid character '\x06' looking for beginning of value
```
Every message on an Avro-enforced topic gets stuck in a permanent retry loop and is never delivered to the application.
Avro schema/wire format mismatch: When CloudEvents wrapping is enabled (the default), the wire format is a CloudEvents envelope, but the Avro schema registered with the Pulsar Schema Registry was the inner domain event schema — not the envelope. This causes a mismatch between the schema in the registry and the actual messages stored in the topic.
JSON schema not validated: The .jsonschema path only checked that the payload was valid JSON (json.Unmarshal) without validating it against the actual schema definition. Invalid payloads that do not conform to the schema were accepted and published.

Impact

Applications subscribing to Avro-enforced Pulsar topics cannot receive messages — they accumulate in the topic and are never delivered.
Schema-aware Pulsar consumers (including non-Dapr consumers) that rely on the broker's schema registry receive a schema that does not match the actual CloudEvents message format.
Applications relying on Pulsar's JSON schema enforcement could publish structurally invalid messages that violate the schema contract, leading to downstream consumer failures.

Affected versions: v1.16.0 through v1.16.11.

Solution

Avro subscribe decode: When an Avro schema is registered for the incoming topic, the subscriber now decodes the binary payload to native Go types using the cached codec (NativeFromBinary), then re-encodes as JSON (TextualFromNative) before passing to the handler. Topics without an Avro schema are unaffected.
CloudEvents envelope schema: When CloudEvents wrapping is enabled, Dapr now wraps the user-provided Avro schema inside a CloudEvents envelope Avro schema before registering it with the broker, following the CloudEvents Avro format spec. A new topic-level <topic-name>.rawSchema metadata option skips envelope wrapping for topics dedicated to raw payloads. Publishing with rawPayload=true to a CloudEvents-wrapped topic is now rejected with a clear error.
JSON schema validation: JSON schema topics now compile a goavro codec at init time (since Pulsar JSON schemas use Avro schema definitions) and validate payloads using NativeFromTextual at publish time for full structural validation. Invalid schemas fail fast at startup. CloudEvents envelope schema generation and rawPayload guards match the Avro path behavior.

Scheduler cluster stalls for up to 20 minutes after pod restart when workflows are running

Problem

After a Scheduler pod restart (due to a rolling update, node maintenance, pod eviction, or OOM kill), the entire Scheduler cluster can stall for minutes.
During this time, no workflows execute, no scheduled jobs fire, and no actor reminders are delivered.

The Scheduler pods remain running and pass health checks.
Logs show "fetched initial leadership, waiting for quorum for partition total" on one or more instances, followed by approximately 20 minutes of silence before "leadership quorum reached" appears and normal operation resumes.

Impact

Any Dapr deployment using Scheduler in HA mode (3 instances) with active workflow or actor reminder workloads is affected.
The stall occurs when a Scheduler pod restarts while jobs are actively being triggered and delivered to daprd sidecars.

Root Cause

When a Scheduler pod restarts, the remaining instances detect a leadership partition change (i.e., 3->2) and attempt to restart their cron engines.
The engine shutdown sequence closes all internal job counter loops, each of which must complete any in-flight trigger before shutting down.

The trigger delivery path calls Pool.Trigger, which enqueues the job to the connection pool and blocks waiting for the daprd sidecar to respond with a success/failure result.
The ctx parameter which is cancelled during engine shutdown is accepted but never checked.
If a daprd sidecar is slow to respond, crashed, has disconnected, or is itself restarting (common during a rolling update), the response never arrives and Pool.Trigger blocks indefinitely.

This prevents the engine from shutting down, which prevents the cron module from calling Reelect to update its leadership key with the new partition total.
The other Scheduler instances see this stale key and cannot reach quorum agreement. The cluster is stuck until the stalled instance's etcd lease expires.

Solution

Pool.Trigger now uses a select on both the response channel and ctx.Done().

When the engine context is cancelled during a quorum change, all in-flight triggers return UNDELIVERABLE immediately. The response channel is drained in the background so that late callbacks from stream shutdown do not corrupt subsequent trigger calls. The engine shuts down within milliseconds, Reelect updates the leadership key, and the cluster converges on the new partition total. The undelivered jobs are automatically retried on the next engine cycle.

Additionally, a failed DeliverablePrefixes call during a new daprd connection no longer kills the entire connection pool. The failed connection is cancelled individually, allowing the daprd to reconnect without disrupting other healthy connections.

This reduces cluster recovery time after a Scheduler pod restart from up to 20 minutes to under 5 seconds.

Note: This issue does not affect Dapr v1.17 or later. The v1.17 release redesigned the trigger delivery path from a synchronous blocking pattern (v1.16's Pool.Trigger waiting on a response channel) to an asynchronous callback pattern where the trigger function returns immediately and the response arrives via a callback. With no blocking call, there is nothing to stall the engine shutdown during quorum changes.

dapr/dapr v1.16.12 Dapr Runtime v1.16.12 on GitHub

Dapr 1.16.12

Security: gRPC authorization bypass

Problem

Impact

Root Cause

Solution

Pulsar pub/sub JSON and Avro schema validation fixes

Problem

Impact

Solution

Scheduler cluster stalls for up to 20 minutes after pod restart when workflows are running

Problem

Impact

Root Cause

Solution

dapr/dapr v1.16.12
Dapr Runtime v1.16.12

on GitHub