Highlights:
- Bug fixes
- Fix compilation crash when there is a container statement after an unconditional continue (#1299) (by xumingkuan)
- CUDA backend
- Fix on-demand memory pool on certain GPUs (#1314) (by Yuanming Hu)
- Intermediate representation
- Replace "OffsetAndExtractBitsStmt" with "BitExtractStmt" (#1306) (by xumingkuan)
- Language and syntax
- Miscellaneous
- Postpone backend detection to prevent possible compatibility issues (#1273) (by 彭于斌)
- IR optimization passes
- Performance improvements
- Thread local storage for range-for reductions on CPUs (#1296) (by Yuanming Hu)
- Standard library
- Add ti.rsqrt() and ti.Vector.norm_inv() (#1293) (by 彭于斌)
Full changelog:
- [cuda] Support numpy and torch tensors with zeros in shapes (e.g., (5, 0, 5)) (#1305) (by Yuanming Hu)
- [refactor] Rename the file created in #1315 (#1316) (by xumingkuan)
- [Opt] [refactor] Move unreachable code elimination to a separate pass (#1315) (by xumingkuan)
- [CUDA] Fix on-demand memory pool on certain GPUs (#1314) (by Yuanming Hu)
- [Opt] Constant folding for BitExtractStmt (#1307) (by xumingkuan)
- [lang] [test] Improve code coverage in SNode (#1214) (by 彭于斌)
- [Lang] Support sep and end in print() (#1311) (by Ye Kuang)
- [metal] Add kernel side util to support print() (#1301) (by Ye Kuang)
- [Misc] Postpone backend detection to prevent possible compatibility issues (#1273) (by 彭于斌)
- [IR] [refactor] Replace "OffsetAndExtractBitsStmt" with "BitExtractStmt" (#1306) (by xumingkuan)
- [Bug] [opt] Fix compilation crash when there is a container statement after an unconditional continue (#1299) (by xumingkuan)
- [ir] [refactor] Simplify the "re_id" pass (#1304) (by xumingkuan)
- [Perf] Thread local storage for range-for reductions on CPUs (#1296) (by Yuanming Hu)
- [bug] [std] Fix matrix print shape in Taichi-scope (#1300) (by 彭于斌)
- [metal] [autodiff] Fix StackLoadTopStmt codegen in Metal (#1298) (by Ye Kuang)
- [Lang] [refactor] Support Python-scope scalar functions / matrix operations, e.g. ti.sqrt(2) (#1188) (by 彭于斌)
- [Std] [lang] Add ti.rsqrt() and ti.Vector.norm_inv() (#1293) (by 彭于斌)
- [Opt] [ir] [refactor] Remove exceptions from lower_access pass (#1292) (by Xuanda Yang)
- [misc] Show LLVM version on startup (#1294) (by FantasyVR)