github typelevel/cats-effect v3.0.0-M4

latest releases: v3.6.0, v3.6.0-RC2, v3.6.0-RC1...
pre-release4 years ago

Cats Effect 3.0.0-M4 is the fourth milestone release in the 3.0.0 line. We are expecting to follow this milestone with subsequent ones as necessary to further refine functionality and features, prior to some number of release candidates leading to a final release. The designation "M4" is meant to indicate our relative confidence in the functionality and API of the library, but it by no means indicates completeness or final compatibility. Neither binary compatibility nor source compatibility with anything are assured prior to 3.0.0, though major breakage is unlikely at this point.

What follows are the changes from M3 to M4. For a more complete summary of the changes generally included in Cats Effect 3.0.0, please see the M1, M2, and M3 release notes.

Major Changes

Support for ARM

With the release of Apple's M1 desktop processor (based on the ARM architecture), as well as the continued push towards the Amazon Graviton architecture within AWS, ARM support has become very much a non-optional feature of any major runtime. Unfortunately, despite running on the JVM, Cats Effect is sufficiently low-level that it doesn't just get this "for free".

In particular, Cats Effect takes advantage of a number of memory-related tricks to drastically improve performance within the implementation of IO. Unfortunately, x86_64 and ARM64 have significantly different memory models, with x86 providing much stricter guarantees than ARM. As it turned out, Cats Effect 3 was accidentally exploiting these stricter guarantees, meaning that programs written using IO which were run on the ARM platform would sometimes nondeterministically deadlock!

As you can imagine, this was a particularly insane bug to track down. Originally identified by @vasilmkd, it resulted in very long nights agonizing over various EC2 instances, as well as a lot of backchannel discussions with experts within the industry to try to narrow down exactly what is going on. There's a long and interesting story here which will eventually become a conference talk and maybe a series of blog posts.

Long story short… Very, very special thanks to @RaasAhsan, who spent a long night devoting his full attention to minimizing the issue from "the entire IO implementation" all the way down to just ~80 lines of Java and two threads. For a nondeterministic memory-related bug which is also CPU architecture-specific, this has to stand as one of the most impressive bug minimization efforts I've seen in my entire career.

Once the bug was minimized, @simonis and @mo-beck very graciously and thoroughly explained exactly what was happening under the surface, complete with snippets of assembly and discussions of various semantic guarantees. In the end, the fix was a single line change swapping compareAndSwap for getAndSet, a brilliant conception for which the credit is entirely owed to @viktorklang.

At the end of the day, the code is now faster (on both x86 and ARM!) and deterministic on all compliant ARM JVMs.

(as an aside, GitHub Actions really needs to hurry up and add support for ephemeral self-hosted runners so that we can add ARM jobs to our CI matrix)

Clock No Longer Extends Applicative

This was a rather annoying foot-gun in the API which reared its ugly head in code like this:

def foo[F[_]: Monad: Clock](f: F[Int]) = 
  f.map(_ + 2)    // error!

The above would not compile due to ambiguities between Monad (which provides map) and Clock (which also provided map). This is a similar situation to Monad and Traverse within the Cats library, though in this case, it is possible to provide a resolution.

Clock no longer extends Applicative, meaning that it no longer implicitly materializes an Applicative instance whenever it is in scope. However, it still contains an Applicative[F] instance within its definition, meaning that it requires Applicative without materializing it. This is the same trick used by Cats MTL, originally explored by @aloiscochard as part of the Scato project.

In practice, this should have relatively little impact on user code aside from removing a source of ambiguity and frustration.

Standard Library Intensifies...

This release saw a significant acceleration of standard library features. In particular:

  • CountDownLatch
  • Deque
  • CyclicBarrier
  • parTraverseN/parSequenceN

Much of this work was done by the tireless @TimWSpence, who is also responsible for the comprehensive support for inductive monad transformer instances up and down the hierarchy! We are continuing to improve and enhance the std module leading up to the 3.0 release, and we expect to continue adding enhancements even after 3.0 is finalized.

Pull Requests

You're all amazing, thank you!

Don't miss a new cats-effect release

NewReleases is sending notifications on new releases.