This is a technical preview for oneDNN Graph API based on oneDNN v2.5.
Functionality
- Introduced bf16 inference support.
- Introduced multi-head attention (MHA) fusion supported by oneDNN Graph compiler with optimized code generation (experimental).
- Updated API to comply with oneDNN Graph API specification v0.9.
Known Issues and Limitations
- Some subgraphs might not be recognized as a partition even if it matches the general pattern description due to internal implementation.
- The weight’s opaque layout can be queried only from a compiled partition, which requires that tensor shapes must be known at compilation time.
- MHA fusion is not activated on machines without AVX-512 support, as oneDNN Graph compiler generates AVX-512 and newer instructions.
Thanks to the Contributors
This release contains contributions from the project core teams as well as Jiong Gong, Chunyuan Wu, Sanchit Jain, Yiqiang Li, Yunfei Mao, Kiefer Kuah and others.