typelevel/cats-effect v3.3.2 on GitHub

This is the seventeenth major release in the Cats Effect 3.x lineage. It is fully binary compatible with every 3.x release, and fully source-compatible with every 3.3.x release. Note that source compatibility has been broken with 3.2.x in some minor areas. Scalafixes are available and should be automatically applied by Scala Steward if relevant.

This patch release focuses primarily on performance improvements in two major areas: blocking/interruptible and suspended fiber tracking.

In the former area, the Cats Effect fiber runtime has long had support for the scala.concurrent.blocking construct within any code which is scheduled on its worker threads. When such a block is hit, the runtime takes it as a signal that it is about to lose a functioning worker thread and thus spawns a new one, seamlessly putting it into rotation to ensure the pool is not starved by the current worker thread being blocked. This trick works very well, but wasn't particularly recommended in user code because the performance was worse than the native IO.blocking operation.

In this release, Vasil has changed the behavior of the pool to seamlessly shift worker state when a blocking section is hit, effectively morphing another thread into the exact state as the now-blocked thread. Additionally, spare threads constructed when blocking operations are hit are now cached for one minute before being cleaned up if still idle, ensuring that they're still around if a subsequent blocking operation is hit in short order.

These improvements, taken together, mean that scala.concurrent.blocking inside of delay is actually faster than the IO.blocking operation by a significant margin, meaning that we can reap immediate performance benefits by converting IO.blocking and IO.interruptible to use this native mechanism rather than an ancillary thread pool.

Please note that the above is plotted on a log scale to make it easier to see the relative differences in each scenario. For reference, the improvements in the "fine grained" benchmark represent the test running 141x faster! (not a percent sign) Blocking is still bad for throughput, but it's a lot less bad now. You can find all of these benchmarks in the repository.

As if that weren't enough, we've reimplemented the tracking mechanism for suspended fibers which underlies the new fiber dump feature introduced in 3.3.0. This feature was and is implemented using a thread local set-like data structure which maintains weak references to any suspended fiber. The weak references are necessary for two reasons. First, it ensures that any fiber which is suspended and then the callback is "lost" can still be garbage collected normally. Second, it allows us to avoid the extra memory barriers associated with backtracking to the suspending thread when the fiber is resumed, making the whole mechanism significantly faster.

Unfortunately, this comes with a cost: these weak references must be examined and ultimately cleaned by the garbage collector, which means that we're effectively taking synchronous work out of the main code path and moving it asynchronously into the garbage collector. This in turn can mean that certain types of workflows which already put significant pressure on the GC may have seen diminished performance with the update to 3.3.0.

This release significantly reduces the GC overhead by simplifying and specializing the data structure to reduce the number of weak references and allocations involved in the tracking itself. The results should be unnoticeable in most optimized workloads, but for applications which are creating a significant amount of short-lived objects within their hot path, these changes should produce a substantial speed-up relative to 3.3.1.

User-Facing Pull Requests

#2699 – Removed interruptible dependency on an explicit ExecutionContext (@djspiewak)
#2687 – Blocking mechanism with cached threads (@vasilmkd)
#2673 – Cross platform weak bag implementation (@vasilmkd)

Thank you so much!