Announcing MAX 25.2, featuring significant enhancements for large-scale AI deployment and GPU optimization. This release adds comprehensive NVIDIA Hopper support with high-performance kernels, multi-GPU tensor parallelism for large models like Llama-3.3-70B, and expanded model support (Phi3, Olmo, Granite). Key additions include GPTQ quantization for memory efficiency, advanced long context optimizations (in-flight batching, chunked prefill, copy-on-write), and improved kernel caching reducing compilation times up to 28%. New Mojo GPU APIs offer developers greater control and performance.
For additional details, checkout the changelog.