Dapr 1.16.12
This update includes bug fixes:
- Security: Fixes gRPC authorization bypass - CVE-2026-33186
- Pulsar pub/sub JSON and Avro schema validation fixes
- Scheduler cluster stalls for minutes after pod restart when workflows are running
Security: gRPC authorization bypass
Problem
An upstream dependency (google.golang.org/grpc) used by Dapr introduced a vulnerability that could allow gRPC authorization bypass under certain conditions (CVE-2026-33186).
Impact
Users running affected versions could be exposed to unauthorized gRPC requests.
Root Cause
The issue originated in an upstream library.
Solution
This release upgrades the affected dependency to a version that resolves CVE-2026-33186.
Users are strongly encouraged to upgrade to this release.
Pulsar pub/sub JSON and Avro schema validation fixes
Problem
The Pulsar pub/sub component had several issues with JSON and Avro schema handling:
-
Avro subscribe path broken: When a Pulsar topic has an Avro schema configured, messages are delivered to the subscriber as raw Avro binary bytes. The Dapr runtime attempts
json.Unmarshalon those bytes and fails with:error deserializing cloud event in pubsub <component> and topic <topic>: invalid character '\x06' looking for beginning of valueEvery message on an Avro-enforced topic gets stuck in a permanent retry loop and is never delivered to the application.
-
Avro schema/wire format mismatch: When CloudEvents wrapping is enabled (the default), the wire format is a CloudEvents envelope, but the Avro schema registered with the Pulsar Schema Registry was the inner domain event schema — not the envelope. This causes a mismatch between the schema in the registry and the actual messages stored in the topic.
-
JSON schema not validated: The
.jsonschemapath only checked that the payload was valid JSON (json.Unmarshal) without validating it against the actual schema definition. Invalid payloads that do not conform to the schema were accepted and published.
Impact
- Applications subscribing to Avro-enforced Pulsar topics cannot receive messages — they accumulate in the topic and are never delivered.
- Schema-aware Pulsar consumers (including non-Dapr consumers) that rely on the broker's schema registry receive a schema that does not match the actual CloudEvents message format.
- Applications relying on Pulsar's JSON schema enforcement could publish structurally invalid messages that violate the schema contract, leading to downstream consumer failures.
Affected versions: v1.16.0 through v1.16.11.
Solution
-
Avro subscribe decode: When an Avro schema is registered for the incoming topic, the subscriber now decodes the binary payload to native Go types using the cached codec (
NativeFromBinary), then re-encodes as JSON (TextualFromNative) before passing to the handler. Topics without an Avro schema are unaffected. -
CloudEvents envelope schema: When CloudEvents wrapping is enabled, Dapr now wraps the user-provided Avro schema inside a CloudEvents envelope Avro schema before registering it with the broker, following the CloudEvents Avro format spec. A new topic-level
<topic-name>.rawSchemametadata option skips envelope wrapping for topics dedicated to raw payloads. Publishing withrawPayload=trueto a CloudEvents-wrapped topic is now rejected with a clear error. -
JSON schema validation: JSON schema topics now compile a
goavrocodec at init time (since Pulsar JSON schemas use Avro schema definitions) and validate payloads usingNativeFromTextualat publish time for full structural validation. Invalid schemas fail fast at startup. CloudEvents envelope schema generation andrawPayloadguards match the Avro path behavior.
Scheduler cluster stalls for up to 20 minutes after pod restart when workflows are running
Problem
After a Scheduler pod restart (due to a rolling update, node maintenance, pod eviction, or OOM kill), the entire Scheduler cluster can stall for minutes.
During this time, no workflows execute, no scheduled jobs fire, and no actor reminders are delivered.
The Scheduler pods remain running and pass health checks.
Logs show "fetched initial leadership, waiting for quorum for partition total" on one or more instances, followed by approximately 20 minutes of silence before "leadership quorum reached" appears and normal operation resumes.
Impact
Any Dapr deployment using Scheduler in HA mode (3 instances) with active workflow or actor reminder workloads is affected.
The stall occurs when a Scheduler pod restarts while jobs are actively being triggered and delivered to daprd sidecars.
Root Cause
When a Scheduler pod restarts, the remaining instances detect a leadership partition change (i.e., 3->2) and attempt to restart their cron engines.
The engine shutdown sequence closes all internal job counter loops, each of which must complete any in-flight trigger before shutting down.
The trigger delivery path calls Pool.Trigger, which enqueues the job to the connection pool and blocks waiting for the daprd sidecar to respond with a success/failure result.
The ctx parameter which is cancelled during engine shutdown is accepted but never checked.
If a daprd sidecar is slow to respond, crashed, has disconnected, or is itself restarting (common during a rolling update), the response never arrives and Pool.Trigger blocks indefinitely.
This prevents the engine from shutting down, which prevents the cron module from calling Reelect to update its leadership key with the new partition total.
The other Scheduler instances see this stale key and cannot reach quorum agreement. The cluster is stuck until the stalled instance's etcd lease expires.
Solution
Pool.Trigger now uses a select on both the response channel and ctx.Done().
When the engine context is cancelled during a quorum change, all in-flight triggers return UNDELIVERABLE immediately. The response channel is drained in the background so that late callbacks from stream shutdown do not corrupt subsequent trigger calls. The engine shuts down within milliseconds, Reelect updates the leadership key, and the cluster converges on the new partition total. The undelivered jobs are automatically retried on the next engine cycle.
Additionally, a failed DeliverablePrefixes call during a new daprd connection no longer kills the entire connection pool. The failed connection is cancelled individually, allowing the daprd to reconnect without disrupting other healthy connections.
This reduces cluster recovery time after a Scheduler pod restart from up to 20 minutes to under 5 seconds.
Note: This issue does not affect Dapr v1.17 or later. The v1.17 release redesigned the trigger delivery path from a synchronous blocking pattern (v1.16's Pool.Trigger waiting on a response channel) to an asynchronous callback pattern where the trigger function returns immediately and the response arrives via a callback. With no blocking call, there is nothing to stall the engine shutdown during quorum changes.