Major Features and Improvements
Intel® Extension for TensorFlow* extended official TensorFlow capability to run TensorFlow workload on Intel® Data Center Max GPU and Intel® Data Center GPU Flex Series. This release contains following major features and improvements:
-
The TensorFlow version supported by Intel® Extension for TensorFlow* v1.2.0 was successfully upgraded to Google latest released TensorFlow 2.12. Due to TensorFlow 2.12 break change in protobuf, Intel® Extension for TensorFlow* can only seamlessly binary co-work with TensorFlow 2.12 in this release.
-
Adopted a uniform Device API PJRT as the supported device plugin mechanism to implement Intel GPU backend for OpenXLA experimental support. Users can build Intel® Extension for TensorFlow* source and run JAX front end APIs with OpenXLA. Refer to OpenXLA Support GPU for more details.
-
Updated oneDNN version to v3.1 which includes multiple functional and performance improvements for CPU and GPU implementations.
-
Supported generative AI model Stable diffusion and optimized model to get better performance. Get started in Stable Diffusion Inference for Text2Image on Intel GPU.
-
Supported XPUAutoShard in Intel® Extension for TensorFlow* as an experimental feature. Given a set of homogeneous XPU devices (eg. 2 GPU tiles), XPUAutoShard automatically shards input data and TensorFlow graph by placing these data/graph shard on different GPU devices to maximize hardware usage. Refer to XPUAutoShard on GPU for more details.
-
Provided Python API
itex.experimental_ops_override()
to automatically override some TensorFlow operators by Customized Operators underitex.ops
namespace, as well as to be compatible with existing trained parameters. More in usage details. -
Added operators performance optimization
- Optimized
ResizeNearestNeighborGrad
/All
/Any
/Slice
/SpaceToBatchND
/BatchToSpaceND
/BiasAddGrad
operators. - Optimized math function(eg.
tanh
,rsqrt
) with small shape (eg. size=8192) on Intel® Data Center GPU Flex Series by vectorization optimization. - Optimized reduction series ops by improving threads and memory utility for Col/Row reduction separately.
- Optimized
-
Supported AOT(Ahead-of-time compilation) on Intel® Data Center Max GPU, Intel® Data Center GPU Flex Series and Intel® Arc™ A-Series GPUs in Intel® Extension for TensorFlow* package in PyPI channel. You can also specify hardware platform type when configure your system in source code build.
-
This release continued to provide experimental support for second generation Intel® Xeon® Scalable Processors and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids) and Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.
Bug Fixes and Other Changes
- Upgraded pybind11 version to support Python 3.11 source build.
- Initialized environment variables for Intel® oneAPI Base Toolkit in docker container by default.
Known Issues
- FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as
'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
- Tensorboard cannot co-work with stock TensorFlow 2.12 due to two issues of tensorflow/tensorflow#60262 and tensorflow/profiler#602.
GLIBC++
version mismatch may cause workload exit with exceptionCan not found any devices. To check runtime environment on your host, please run itex/tools/env_check.sh.
Please try env_check.sh for assistance.
Documents
-
Provided new guide documentation to developers for How to write custom op.
-
Distributed supported by Intel® Optimization for Horovod*.