Features
CUDA Graphs
Enable CUDA Graphs for TRT (#19184)
CUDA graphs support (#19142)
CUDA 11 Support
Update CUB and include it only for CUDA < 11 #18799' (#18975)
Add new CI pipeline for building and testing with cuda 11.0. (#19149)
TensorRT
TensorRT: add int8 with calibration (#19011)
Add TRT verbose mode (#19100)
Backporting TensorRT-Gluon Partition API (and TensorRT 7 support) (#18916)
OneDNN
Upgrade to oneDNN v1.6.3 (#19153) (#19161)
Update oneDNN to official v1.6 release (#18867) (#18867)
Upgrade to oneDNN v1.6 (#18822)
IntGemm
Backport of intgemm #17559 (#19099)
Subgraph API
Backport Fix for duplicate subgraph inputs/outputs (#16131) (#19112)
Extensions
Backport #19103 (#19117)
Backporting #19016 (#19069)
Backport: Change Partition API's options_map to std::unordered_map #18929 (#18964)
Backporting #18779 to v1.x (#18894)
ONNX
Update onnx support to work with onnx 1.7.0 with most CV models (#19017)
Large Tensor
Fix linalg_potri and linalg_potrf operators for large tensor. (#18752)
Add forward, backward test for linalg.gemm2 (#18784)
Add large matrix tests for linalg ops: det, inverse, trsm, trmm (#18744)
Add Large Tensor Test for linalg_syrk (#18782)
Add Large Dim Checks for linalg Operators (#18816)
Add forward & backward linalg.gemm test for large size (#18825)
Adding error message when attempting to use Large tensor with linalg_syevd (#18807)
Website Improvements
v1.8 website patch (#19212)
Documentation
Fix mxnet.test_utils.check_numeric_gradient documentation (#19060)
Update windows_setup.md (#18874)
License
Stop packaging GPL libquadmath.so (#19055)
Remove mention of nightly in pypi (#18635) (#18884)
Mkldnn header fix v1x for nightly binaries (#18797)
CI Improvements
Upgrade unix gpu toolchain (#18186) (#18785)
Fix CI in v1.x branch (#18907)
Bug Fixes
Delete executor before reallocating it memory (#19222)
Nightly Large Tensor test cherrypicks (#19194) (#19215)
Tweeking syntax to be closer to other tests (#19186) (#19206)
ElementWiseSum fix for oneDNN (#18777) (#19200)
Fix flaky intgemm test in v1.8.x too (#19204)
Revert "Fix memory leaks in Gluon (#18328) (#18359)" (#19181)
Improve environment variable handling in unittests (#18424) (#19173)
Unittest tolerance handling improvements (#18694). Also test seeding (#18762). (#19148)
Fix the error of gradient of np.pad (#19044) (#19167)
Add cmake flag USE_FATBIN_COMPRESSION, ON by default (#19123) (#19158)
SymbolBlock.imports ignore_extra & allow_missing (#19156)
Fix race condition in NaiveEngine::PushAsync (#19108) (#19122)
Empty list cannot be cleared issue fixed. (#14882)
Update base_module.py (#19096)
Fix block.export (#17970) (#19075)
Support for fp16 in SpM x DnsM on GPU (#18930) (#19074)
Fix LeakyRelu behaviour on empty input (#18934) (#19009)
Get rid of monkey patching in LossScaler overflow handling (#18959) (#18973)
Remove upper bound (#18857) (#18910)
Fix gelu to use erf based algorithm (#18827) (#18946)
Fix for Clojure failure on #18883 (#18945)
Backward inference from 2.x #18348 and #18378 (#18895)
Invoke mkldnn and cudnn BatchNorm when axis != 1 to v1.7.x (#18676) (#18890)
Bump version to 1.8.0 (#18899)
Fixing ONNX spatial export for batchnorm (#17711) (#18846)
Fix softmax, logsoftmax failed on empty ndarray (#18602) (#18708)
Add unit tests for potri and potrf backward and check output shape in unit tests. (#18803)
Add syrk test shape check (#18812)
Back port optimization to broadcast_axis to MXNet1.x (#18773)
Fix crash when accessing already destructed static variables (#18768) (#18778)
Jetpack fixes #18677 #18713 (#18742)