This is the thirty-seventh release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release. It is expected to be fully source- and binary-compatible with the final version of 3.5.0, but there are no guarantees of such.
As with all release candidates, we are not aware of any bugs or issues preventing production use, but we are making this release precisely because we know that the changes in this version are of a sufficiently significant nature as to benefit from broader testing and experimentation across the ecosystem before we incorporate them into a stable release. If you have the time, please do take a moment to try this version in your library or service and see how things work!
Major Changes
Despite the deceptively short list of merged pull requests, this release contains an unusually large number of significant changes in runtime semantics. The changes in async
cancelation (and particularly the implications on async_
) are definitely expected to have user-facing impact, potentially breaking existing code in subtle ways. If you have any code which uses async_
(or async
) directly, you should read this section very carefully and potentially make the corresponding changes.
async
Cancelation Semantics
The IO.async
(and correspondingly, Async#async
) constructor takes a function which returns a value of type IO[Option[IO[Unit]]]
, with the Some
case indicating the finalizer which should be invoked if the fiber is canceled while asynchronously suspended at this precise point, and None
indicating that there is no finalizer for the current asynchronous suspension. This mechanism is most commonly used for "unregister" functions. For example, consider the following reimplementation of the sleep
constructor:
def sleep(time: FiniteDuration, executor: ScheduledExecutorService): IO[Unit] =
IO.async[Unit] { cb =>
IO {
val f = executor.schedule(() => cb(Right(())), time.toNanos, TimeUnit.NANOSECONDS)
Some(IO(f.cancel()))
}
}
In the above, the IO
returned from sleep
will suspend for time
. If its fiber is canceled, the f.cancel()
function will be invoked (on ScheduledFuture
), which in turn removes the Runnable
from the ScheduledExecutorService
, avoiding memory leaks and such. If we had instead returned None
from the registration effect, there would have been no finalizer and no way for fiber cancelation to clean up the stray ScheduledFuture
.
The entirety of Cats Effect's design is prescriptively oriented around safe cancelation. If Cats Effect cannot guarantee that a resource is safely released, it will prevent cancelation from short-circuiting until execution proceeds to a point at which all finalization is safe. This design does have some tradeoffs (it can lead to deadlocks in poorly behaved programs), but it has the helpful outcome of strictly avoiding resource leaks, either due to incorrect finalization or circumvented backpressure.
...except in IO.async
. Prior to 3.5.0, defining an async
effect without a finalizer (i.e. producing None
) resulted in an effect which could be canceled unconditionally, without the invocation of any finalizer. This was most seriously felt in the async_
convenience constructor, which always returns None
. Unfortunately, this semantic is very much the wrong default. It makes the assumption that the normal case for async
is that the callback just cleans itself up (somehow) and no unregistration is possible or necessary. In almost all cases, the opposite is true.
It is exceptionally rare, in fact, for an async
effect to not have an obvious finalizer. By defining the default in this fashion, Cats Effect made it very easy to engineer resource leaks and backpressure loss. This loophole is now closed, both in the IO
implementation and in the laws which govern its behavior.
As of 3.5.0, the following is now considered to be uncancelable:
IO.async[A] { cb =>
IO {
// ...
None // we aren't returning a finalizer
}
}
Previously, the above was cancelable without any caveats. Notably, this applies to all uses of the async_
constructor!
In practice, we expect that usage of the async
constructor which was already well behaved will be unaffected by this change. However, any use which is (possibly unintentionally) relying on the old semantic will break, potentially resulting in deadlock as a cancelation which was previously observed will now be suppressed until the async
completes. For this reason, users are advised to carefully audit their use of async
to ensure that they always return Some(...)
with the appropriate finalizer that unregisters their callback.
In the event that you need to restore the previous semantics, they can be approximated by producing Some(IO.unit)
from the registration. This is a very rare situation, but it does arise in some cases. For example, the definition of IO.never
had to be adjusted to the following:
def never: IO[Nothing] =
IO.async(_ => IO.pure(Some(IO.unit))) // was previously IO.pure(None)
This change can result in some very subtle consequences. If you find unexpected effects in your application after upgrading to 3.5.0, you should start your investigation with this change! (note that this change also affects third-party libraries using async
, even if they have themselves not yet updated to 3.5.0 or higher!)
Integrated Timers
From the very beginning, Cats Effect and applications built on top of it have managed timers (i.e. IO.sleep
and everything built on top of it) on the JVM by using a separate thread pool. In particular, ScheduledExecutorService
. This is an extremely standard approach used prolifically by almost all JVM applications. Unfortunately, it is also fundamentally suboptimal.
The problem stems from the fact that ScheduledExecutorService
isn't magic. It works by maintaining one or more event dispatch threads which interrogate a data structure containing all active timers. If any timers have passed their expiry, the thread invokes their Runnable
. If no timers are expired, the thread blocks for the minimum time until the next timer becomes available. In its default configuration, the Cats Effect runtime provisions exactly one event dispatch thread for this purpose.
This isn't so bad when an application makes very little use of timers, since the thread in question will spend almost all of its time blocked, doing nothing. This affects timeslice granularity within the OS kernel and adds an additional GC root, but both effects are small enough that they are usually unnoticed. The bigger problem comes when an application is using a lot of timers and the thread is constantly busy reading that data structure and dispatching the next set of Runnable
(s) (all of which complete async
s and immediately shift back into the Cats Effect compute pool).
Unfortunately, this situation where a lot of timers are in use is exactly what happens in every network application, since each and every active socket must have at least one IO.sleep
associated with it to time out handling if the remote side stops responding (in most cases, such as HTTP, even more than one timer is needed). In other words, the fact that IO.sleep
is relatively inefficient when a lot of concurrent sleep
s are scheduled is particularly egregiously bad, since this is precisely the situation that describes most real-world usage of Cats Effect.
So we made this better! Cats Effect 3.5.0 introduces a new implementation of timers based on cooperative polling, which is basically the idea that timers can be dispatched and handled entirely by the same threads which handle compute work. Every time a compute worker thread runs out of work to do (and has nothing to steal), rather than just parking and waiting for more work, it first checks to see if there are any outstanding timers. If there are some which are ready to run, it runs them. Otherwise, if there are timers which aren't yet completed, the worker parks for that period of time (or until awakened by new work), ensuring the timer fires on schedule. In the event that a worker has not had the opportunity to park in some number of iterations, it proactively checks on its timers just to see if any have expired while it has been busy doing CPU-bound work.
This technique works extremely well in Cats Effect precisely because every timer had to shift back to the compute pool anyway, meaning that it was already impossible for any timer to have a granularity which was finer than that of the compute worker thread task queue. Thus, having that same task queue manage the dispatching of the timers themselves ensures that at worst those timers run with the same precision as previously, and at best we are able to avoid a considerable amount of overhead both in the form of OS kernel scheduler contention (since we are removing a whole thread from the application!) and the expense of a round-trip context shift and passage through the external work queue.
And, as mentioned, this optimization applies specifically to a scenario which is present in almost all real-world Cats Effect applications! To that end, we tested the performance of a relatively simple Http4s Ember server while under heavy load generated using the hey
benchmark tool. The result was a roughly 15-25% improvement in sustained maximum requests per second, and a roughly 15% improvement in the 99th percentile latencies (P99). In practical terms, this means that this one change makes standard microservice applications around 15% more efficient with no other adjustments.
Obviously, you should do your own benchmarking to measure the impact of this optimization, but we expect the results to be very visible in production top-line metrics.
User-Facing Pull Requests
- #3409 – Even faster async mutex (@armanbilge)
- #3408 – Add 'flatModify', 'flatModifyFull' and corresponding 'State' methods (@seigert)
- #3387 – Thread blocking detection (@TimWSpence)
- #3374 – Add
fromFutureCancelable
and friends (@armanbilge) - #3346 – Optimize
Mutex
&AtomicCell
(@BalmungSan) - #3347 –
ConcurrentAtomicCell
(@BalmungSan) - #3302 – Add 'IOLocal.lens' method to produce lens 'A <=> B' (@seigert)
- #3388 – Protect timers against
Long
overflow (@durban) - #3219 – Integrated timers (@djspiewak, @vasilmkd)
- #3225 – Introduce a
BatchingMacrotaskExecutor
(@armanbilge) - #3328 – Use
asyncCheckAttempt
inIODeferred#get
(@armanbilge) - #3304 – Add
IO#supervise
,IO#toResource
,IO#metered
(@kamilkloch) - #3299 – Add
IO#voidError
(@armanbilge) - #3205 – Change
async_
to be uncancelable (@djspiewak) - #3264 –
Defer
instance forResource
withoutSync
requirement (@Odomontois) - #3091 – Add
Async#asyncCheckAttempt
for #3087 (@seigert) - #3002, #3390 – Documentation fixes and improvements (@danicheg, @davidabrahams)
Thank you!