New Features
Table Batched Embedding enhancements:
- TBE performance optimizations (#1224, #1279, #1292, #1293, #1294, #1295, #1300, #1332, #1334, #1335, #1338, #1339, #1340, #1341, #1353, #1365)
- Added FP16 weight type and output_dtype support for Dense TBE (#1343, #1348, #1370)
- Direct Mapped UVM Cache (#1298)
AMD Support (beta) (#1102, #1193)
- FBGEMM previously supported only NVIDIA accelerators, but FBGEMM 0.3.0 started to support AMD GPUs in collaboration with AMD. Although its support is still beta (e.g., we don't have a stable release build for AMD GPUs yet), the AMD GPU implementation covers almost all the FBGEMM operators supported by NVIDIA GPUs. AMD GPU support is tested using CI with AMD MI250 GPUs.
Quantized Communication Primitives (#1219, #1337)
Sparse kernel enhancements
- New kernel: invert_permute (#1403)
- New kernel: truncate_jagged_1d (#1345)
- New kernel: jagged_index_select (#1157)
- Jagged Tensor optimization for inference use cases (#1236)
Improved documentation for Jagged Tensors and SplitTableBatchedEmbeddingBagsCodegen
Optimized 2x2 kernel for AVX2 (#1280)
Full Changelog: https://github.com/pytorch/FBGEMM/commits/v0.3.0