Release 24.6

We are excited to announce the release of MAX 24.6, featuring a preview of MAX GPU! At the heart of the MAX 24.6 release is MAX GPU – the first vertically integrated Generative AI serving stack that eliminates the dependency on vendor-specific computation libraries like NVIDIA’s CUDA.

MAX GPU is built on two groundbreaking technologies. The first is MAX Engine, a high-performance AI model compiler and runtime built with innovative Mojo GPU kernels for NVIDIA GPUs–free from CUDA or ROCm dependencies. The second is MAX Serve, a sophisticated Python-native serving layer specifically engineered for LLM applications. MAX Serve expertly handles complex request batching and scheduling, delivering consistent and reliable performance, even under heavy workloads.

For additional details, checkout the changelog and the release announcement.

modularml/mojo max/v24.6.0 Mojo 24.6 on GitHub

Release 24.6

modularml/mojo max/v24.6.0
Mojo 24.6

on GitHub