taichi-dev/taichi v0.6.21 on GitHub

Highlights:

CUDA backend
- Upgrade to PTX 6.3 and add a few CUDA intrinsics (#1548) (by Yuanming Hu)
Performance improvements
- Improve dynamic listgen and access performance (#1547) (by Yuanming Hu)
Refactor
- 'ti.Matrix(n, m, dt, shape)' is deprecated, use 'ti.Matrix.var(n, m, dt, shape)' instead (#1531) (by 彭于斌)

Full changelog:

[cc] The C backend is now capable of running mpm128 (#1553) (by 彭于斌)
[bug] Update mpm_lagrangian_force and fix Matrix constructor (#1545) (by Ye Kuang)
[opengl] [refactor] KernelParallelAttribs -> ParallelSize + virtual methods to make a way for grid-stride-loop (#1540) (by 彭于斌)
[opengl] Fix reversed nested for loops error on OpenGL (#1554) (by 彭于斌)
[Perf] Improve dynamic listgen and access performance (#1547) (by Yuanming Hu)
[cuda] [llvm] Module broken is TI_WARN instead of TI_ERROR (#1557) (by 彭于斌)
[linux] Fix LLVM symbol leakage in release mode by using RTLD_GLOBAL (#1544) (by 彭于斌)
[CUDA] Upgrade to PTX 6.3 and add a few CUDA intrinsics (#1548) (by Yuanming Hu)
[ir] Move struct-for demotion pass after offload pass (#1541) (by Ye Kuang)
[cc] Support "range for" and "while" statement on C backend (#1536) (by 彭于斌)
[refactor] Better import order by using __all__ (#1510) (by 彭于斌)
[misc] Add is_path_all_dense to SNode (#1538) (by Ye Kuang)
[Refactor] 'ti.Matrix(n, m, dt, shape)' is deprecated, use 'ti.Matrix.var(n, m, dt, shape)' instead (#1531) (by 彭于斌)
[lang] [refactor] Setup a multipass AST transformer (#1467) (by 彭于斌)