Highlights:
- CUDA backend
- Upgrade to PTX 6.3 and add a few CUDA intrinsics (#1548) (by Yuanming Hu)
- Performance improvements
- Improve dynamic listgen and access performance (#1547) (by Yuanming Hu)
- Refactor
- 'ti.Matrix(n, m, dt, shape)' is deprecated, use 'ti.Matrix.var(n, m, dt, shape)' instead (#1531) (by 彭于斌)
Full changelog:
- [cc] The C backend is now capable of running mpm128 (#1553) (by 彭于斌)
- [bug] Update mpm_lagrangian_force and fix Matrix constructor (#1545) (by Ye Kuang)
- [opengl] [refactor] KernelParallelAttribs -> ParallelSize + virtual methods to make a way for grid-stride-loop (#1540) (by 彭于斌)
- [opengl] Fix reversed nested for loops error on OpenGL (#1554) (by 彭于斌)
- [Perf] Improve dynamic listgen and access performance (#1547) (by Yuanming Hu)
- [cuda] [llvm] Module broken is TI_WARN instead of TI_ERROR (#1557) (by 彭于斌)
- [linux] Fix LLVM symbol leakage in release mode by using RTLD_GLOBAL (#1544) (by 彭于斌)
- [CUDA] Upgrade to PTX 6.3 and add a few CUDA intrinsics (#1548) (by Yuanming Hu)
- [ir] Move struct-for demotion pass after offload pass (#1541) (by Ye Kuang)
- [cc] Support "range for" and "while" statement on C backend (#1536) (by 彭于斌)
- [refactor] Better import order by using __all__ (#1510) (by 彭于斌)
- [misc] Add is_path_all_dense to SNode (#1538) (by Ye Kuang)
- [Refactor] 'ti.Matrix(n, m, dt, shape)' is deprecated, use 'ti.Matrix.var(n, m, dt, shape)' instead (#1531) (by 彭于斌)
- [lang] [refactor] Setup a multipass AST transformer (#1467) (by 彭于斌)