pytorch/FBGEMM v0.3.0 on GitHub

Table Batched Embedding enhancements:

TBE performance optimizations (#1224, #1279, #1292, #1293, #1294, #1295, #1300, #1332, #1334, #1335, #1338, #1339, #1340, #1341, #1353, #1365)
Added FP16 weight type and output_dtype support for Dense TBE (#1343, #1348, #1370)
Direct Mapped UVM Cache (#1298)

AMD Support (beta) (#1102, #1193)

FBGEMM previously supported only NVIDIA accelerators, but FBGEMM 0.3.0 started to support AMD GPUs in collaboration with AMD. Although its support is still beta (e.g., we don't have a stable release build for AMD GPUs yet), the AMD GPU implementation covers almost all the FBGEMM operators supported by NVIDIA GPUs. AMD GPU support is tested using CI with AMD MI250 GPUs.

Quantized Communication Primitives (#1219, #1337)

Sparse kernel enhancements

Improved documentation for Jagged Tensors and SplitTableBatchedEmbeddingBagsCodegen

Optimized 2x2 kernel for AVX2 (#1280)

Full Changelog: https://github.com/pytorch/FBGEMM/commits/v0.3.0