github intel/intel-extension-for-pytorch v2.6.10+xpu
Intel® Extension for PyTorch* v2.6.10+xpu Release Notes

one day ago

We are excited to announce the release of Intel® Extension for PyTorch* v2.6.10+xpu. This is the new release which supports Intel® GPU platforms (Intel® Data Center GPU Max Series, Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Flex Series) based on PyTorch* 2.6.0.

Highlights

  • Intel® oneDNN v3.7 integration

  • Intel® oneAPI Base Toolkit 2025.0.1 compatibility

  • Official PyTorch 2.6 prebuilt binaries support

    Starting this release, Intel® Extension for PyTorch* supports official PyTorch prebuilt binaries, as they are built with _GLIBCXX_USE_CXX11_ABI=1 since PyTorch* 2.6 and hence ABI compatible with Intel® Extension for PyTorch* prebuilt binaries which are always built with _GLIBCXX_USE_CXX11_ABI=1.

  • Large Language Model (LLM) optimization

    Intel® Extension for PyTorch* provides support for a variety of custom kernels, which include commonly used kernel fusion techniques, such as rms_norm and rotary_embedding, as well as attention-related kernels like paged_attention and chunked_prefill, and punica kernel for serving multiple LoRA finetuned LLM. It also provides the MoE (Mixture of Experts) custom kernels including topk_softmax, moe_gemm, moe_scatter, moe_gather, etc. These optimizations enhance the functionality and efficiency of the ecosystem on Intel® GPU platform by improving the execution of key operations.

    Besides that, Intel® Extension for PyTorch* optimizes more LLM models for inference and finetuning, such as Phi3-vision-128k, phi3-small-128k, llama3.2-11B-vision, etc. A full list of optimized models can be found at LLM Optimizations Overview.

  • Serving framework support

    Intel® Extension for PyTorch* offers extensive support for various ecosystems, including vLLM and TGI, with the goal of enhancing performance and flexibility for LLM workloads on Intel® GPU platforms (intensively verified on Intel® Data Center GPU Max Series and Intel® Arc™ B-Series graphics on Linux). The vLLM/TGI features like chunked prefill, MoE (Mixture of Experts) etc. are supported by the backend kernels provided in Intel® Extension for PyTorch*. The support to low precision such as Weight Only Quantization (WOQ) INT4 is also enhanced in this release:

    • The performance of INT4 GEMM kernel based on Generalized Post-Training Quantization (GPTQ) algorithm has been improved by approximately 1.3× compared with previous release. During the prefill stage, it achieves similar performance to FP16, while in the decode stage, it outperforms FP16 by approximately 1.5×.
    • The support of Activation-aware Weight Quantization (AWQ) algorithm is added and the performance is on par with GPTQ without g_idx.
  • [Prototype] NF4 QLoRA finetuning using BitsAndBytes

    Intel® Extension for PyTorch* now supports QLoRA finetuning with BitsAndBytes on Intel® GPU platforms. It enables efficient adaptation of LLMs using NF4 4-bit quantization with LoRA, reducing memory usage while maintaining accuracy.

  • [Beta] Intel® Core™ Ultra Series 2 Mobile Processors support on Windows

    Intel® Extension for PyTorch* provides beta quality support of Intel® Core™ Ultra Series 2 Mobile Processors (codename Arrow Lake-H) on Windows in this release, based on redistributed PyTorch 2.6 prebuilt binaries with additional AOT compilation target for Arrow Lake-H in the download server.

  • Hybrid ATen operator implementation

    Intel® Extension for PyTorch* uses ATen operators available in Torch XPU Operators as much as possible and overrides very limited operators for better performance and broad data type support.

Breaking Changes

  • Intel® Data Center GPU Flex Series support is being deprecated and will no longer be available starting from the release after v2.6.10+xpu.
  • Channels Last 1D support on XPU is being deprecated and will no longer be available starting from the release after v2.6.10+xpu.

Known Issues

Please refer to Known Issues webpage.

Don't miss a new intel-extension-for-pytorch release

NewReleases is sending notifications on new releases.