2.5.10+xpu
We are excited to announce the release of Intel® Extension for PyTorch* v2.5.10+xpu. This is the new release which supports Intel® GPU platforms (Intel® Data Center GPU Max Series, Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics and Intel® Data Center GPU Flex Series) based on PyTorch* 2.5.1.
Highlights
-
Intel® oneDNN v3.6 integration
-
Intel® oneAPI Base Toolkit 2025.0.1 compatibility
-
Intel® Arc™ B-series Graphics support on Windows (prototype)
-
Large Language Model (LLM) optimization
Intel® Extension for PyTorch* enhances KV Cache management to cover both Dynamic Cache and Static Cache methods defined by Hugging Face, which helps reduce computation time and improve response rates so as to optimize the performance of models in various generative tasks. Intel® Extension for PyTorch* also supports new LLM features including speculative decoding which optimizes inference by making educated guesses about future tokens while generating the current token, sliding window attention which uses a fixed-size window to limit the attention span of each token thus significantly improves processing speed and efficiency for long documents, and multi-round conversations for supporting a natural human conversation where information is exchanged in multiple turns back and forth.
Besides that, Intel® Extension for PyTorch* optimizes more LLM models for inference and finetuning. A full list of optimized models can be found at LLM Optimizations Overview.
-
Serving framework support
Typical LLM serving frameworks including vLLM and TGI can co-work with Intel® Extension for PyTorch* on Intel® GPU platforms on Linux (intensively verified on Intel® Data Center GPU Max Series). The support to low precision such as INT4 Weight Only Quantization, which is based on Generalized Post-Training Quantization (GPTQ) algorithm, is enhanced in this release.
-
Beta support of full fine-tuning and LoRA PEFT with mixed precision
Intel® Extension for PyTorch* enhances this feature for optimizing typical LLM models and makes it reach Beta quality.
-
Kineto Profiler Support
Intel® Extension for PyTorch* removes this redundant feature as the support of Kineto Profiler based on PTI on Intel® GPU platforms is available in PyTorch* 2.5.
-
Hybrid ATen operator implementation
Intel® Extension for PyTorch* uses ATen operators available in Torch XPU Operators as much as possible and overrides very limited operators for better performance and broad data type support.
Breaking Changes
- Block format support: oneDNN Block format integration support has been removed since v2.5.10+xpu.
Known Issues
Please refer to Known Issues webpage.