modular/max max/v25.2.0 on GitHub

Announcing MAX 25.2, featuring significant enhancements for large-scale AI deployment and GPU optimization. This release adds comprehensive NVIDIA Hopper support with high-performance kernels, multi-GPU tensor parallelism for large models like Llama-3.3-70B, and expanded model support (Phi3, Olmo, Granite). Key additions include GPTQ quantization for memory efficiency, advanced long context optimizations (in-flight batching, chunked prefill, copy-on-write), and improved kernel caching reducing compilation times up to 28%. New Mojo GPU APIs offer developers greater control and performance.

For additional details, checkout the changelog.

modular/max max/v25.2.0 MAX 25.2 on GitHub

modular/max max/v25.2.0
MAX 25.2

on GitHub