Performance Optimizations
- DGEMM, DTRSM, DGEMV, ZGEMM, DTRSV, DCOPYV on Zen4/5
- DSCALV, DDOTV on Zen3
- Benchmark support for ASUMV
- Minor Bug Fixes.
Aocl-gemm Add-on Module updates
- AOCL_ENABLE_INSTRUCTIONS support
- batch_gemm support for all data types
- New Output Datatype for Integer APIs
- BF16 Support on AVX2 Platforms
- WOQ with/without Group Quantization
- Threading Framework Optimizations
- Reference Kernels for all Reorder APIs
- Performance Optimizations for all APIs
- Additional APIs and Post-Ops support in addition to the improved performance for the existing APIs