Features
CUDA Graphs
CUDA 11 Support
- Update CUB and include it only for CUDA < 11 #18799' (#18975)
- Add new CI pipeline for building and testing with cuda 11.0. (#19149)
- Enable CUDA 11.0 on nightly development builds (#19314)
TensorRT
- TensorRT: add int8 with calibration (#19011)
- Add TRT verbose mode (#19100)
- Backporting TensorRT-Gluon Partition API (and TensorRT 7 support) (#18916)
- Backport TRT test update #19296 (#19298)
OneDNN
- Upgrade to oneDNN v1.6.3 (#19153) (#19161)
- Update oneDNN to official v1.6 release (#18867) (#18867)
- Upgrade to oneDNN v1.6 (#18822)
- bumped version to v1.6.5 (#19437)
- Upgrade to oneDNN v1.7 (#19560)
IntGemm
Subgraph API
Extensions
- Backport #19103 (#19117)
- Backporting #19016 (#19069)
- Backport: Change Partition API's options_map to std::unordered_map #18929 (#18964)
- Backporting #18779 to v1.x (#18894)
- Backport extension bug fixes to v1.8.x (#19469) (#19504)
- fix for MX_ERROR_MSG namespace (#19756)
ONNX
- Update onnx support to work with onnx 1.7.0 with most CV models (#19017)
Large Tensor
- Fix linalg_potri and linalg_potrf operators for large tensor. (#18752)
- Add forward, backward test for linalg.gemm2 (#18784)
- Add large matrix tests for linalg ops: det, inverse, trsm, trmm (#18744)
- Add Large Tensor Test for linalg_syrk (#18782)
- Add Large Dim Checks for linalg Operators (#18816)
- Add forward & backward linalg.gemm test for large size (#18825)
- Adding error message when attempting to use Large tensor with linalg_syevd (#18807)
Website Improvements
Documentation
License
- Stop packaging GPL libquadmath.so (#19055)
- Remove mention of nightly in pypi (#18635) (#18884)
- Mkldnn header fix v1x for nightly binaries (#18797)
- Update LICENSE for all submodules. (#19440)
- LICENSE update (#19443)
- Update LICENSE (#19704) (#19707)
CI Improvements
- Upgrade unix gpu toolchain (#18186) (#18785)
- Fix CI in v1.x branch (#18907)
- Remove extra --build-arg causing docker command to fail. (#19412)
- Fix CI builds failing due to invalid GPG keys. (#19377) (#19388)
Bug Fixes
- Backport #19656 - fix R builds (#19658)
- remove cleanup on side threads (#19557)
- Don't use namespace for pow() function, since it is built into cuda math library, and cast the second argument so it will find an acceptable form. (#19533)
- Remove temporary fix for RNN (#19451)
- backport #19393 to v1.8.x (#19398)
- Fix SoftReLU fused operator numerical stability (#17849) (#19390)
- Temporary fix for RNN with oneDNN seg faults/core dumps (#19308)
- Fix MKLDNN BatchNorm with even number of channels (#19150) #19299 #19425 (#19428)
- Relaxing type requirements for broadcast_like (#17977) (#19448)
- Backporting: Fixed setting attributes in reviewSubgraph (#19278)
- Include oneDNN gemm fix (#19251)
- Fix for breaking change introduced in #17123 when batch_axis=0 (#19283)
- Backport PR #19272 to v1.8.x (#19273)
- Backport PRs in v1.7.x missing from v1.x to v1.8.x (#19262)
- Delete executor before reallocating it memory (#19222)
- Nightly Large Tensor test cherrypicks (#19194) (#19215)
- Tweeking syntax to be closer to other tests (#19186) (#19206)
- ElementWiseSum fix for oneDNN (#18777) (#19200)
- Fix flaky intgemm test in v1.8.x too (#19204)
- Revert "Fix memory leaks in Gluon (#18328) (#18359)" (#19181)
- Improve environment variable handling in unittests (#18424) (#19173)
- Backport Unittest tolerance handling improvements (#18694). Also test seeding (#18762). (#19148)
- Fix the error of gradient of np.pad (#19044) (#19167)
- Backport Add cmake flag USE_FATBIN_COMPRESSION, ON by default (#19123) (#19158)
- SymbolBlock.imports ignore_extra & allow_missing (#19156)
- Fix race condition in NaiveEngine::PushAsync (#19108) (#19122)
- Empty list cannot be cleared issue fixed. (#14882)
- Update base_module.py (#19096)
- Fix block.export (#17970) (#19075)
- Support for fp16 in SpM x DnsM on GPU (#18930) (#19074)
- Backport of Fix LeakyRelu behaviour on empty input (#18934) (#19009)
- Get rid of monkey patching in LossScaler overflow handling (#18959) (#18973)
- Remove upper bound (#18857) (#18910)
- Fix gelu to use erf based algorithm (#18827) (#18946)
- Cherry-pick #18635 to v1.7.x (#18935) (#18945)
- Backporting backward inference from 2.x #18348 and #18378 (#18895)
- Backport Invoke mkldnn and cudnn BatchNorm when axis != 1 to v1.7.x (#18676) (#18890)
- Bump version to 1.8.0 (#18899)
- Fixing ONNX spatial export for batchnorm (#17711) (#18846)
- Fix softmax, logsoftmax failed on empty ndarray (#18602) (#18708)
- Add unit tests for potri and potrf backward and check output shape in unit tests. (#18803)
- Add syrk test shape check (#18812)
- Back port optimization to broadcast_axis to MXNet1.x (#18773)
- Fix crash when accessing already destructed static variables (#18768) (#18778)
- Cherrypick #18677 #18713 (#18742)