Highlights:
- Bug fixes
- Fix some missing operator overrides like rlshift (#1978) (by 彭于斌)
- CUDA backend
- Support NVIDIA RTX 3000 series GPUs (#1983) (by Yuanming Hu)
- Language and syntax
- Add ti.loop_unique(val) to improve atomics demotion (#1961) (by xumingkuan)
- Type system
- Implement is_primitive and refactor primitive type equality check (#1975) (by Xuanda Yang)
Full changelog:
- [misc] Fix compabtility with pybind11 2.6 (#1984) (by Yuanming Hu)
- [CUDA] Support NVIDIA RTX 3000 series GPUs (#1983) (by Yuanming Hu)
- [metal] Create helper methods for TLS codegen (#1982) (by Ye Kuang)
- [Bug] [lang] Fix some missing operator overrides like rlshift (#1978) (by 彭于斌)
- [type] Add CustomIntType/BitStructType and corresponding SNodes (#1968) (by Yuanming Hu)
- [Type] [refactor] Implement is_primitive and refactor primitive type equality check (#1975) (by Xuanda Yang)
- [async] Add allocator async state (#1973) (by Ye Kuang)
- [Lang] [opt] Add ti.loop_unique(val) to improve atomics demotion (#1961) (by xumingkuan)