Changelog:
- [bug] Disable test_dense_dynamic for CUDA (#2179) (by Ye Kuang)
- [type] Apply quant_opt_atomic_demotion when storing all components (#2176) (by xumingkuan)
- [type] Improve bit struct store fusion and atomic demotion (#2175) (by xumingkuan)
- [type] Atomic demotion for bit struct stores (#2174) (by xumingkuan)
- [refactor] Move LLVM CustomType-related functions to codegen_llvm_quant.cpp (#2173) (by Taichi Gardener)
- [type] Use a single atomicCAS for BitStructStoreStmt (#2171) (by xumingkuan)
- [async] Add option async_opt_fusion_max_iter (#2170) (by Yuanming Hu)
- [async] Set the default value of async_flush_every to 50 (#2169) (by xumingkuan)
- [async] Add config flag async_max_fuse_per_task (#2165) (by Ye Kuang)
- [type] Support reading bit_struct as its physical type (#2166) (by Yuanming Hu)
- [infra] Support Timelines as a multithreading profiler (#2164) (by Yuanming Hu)
- [async] [lang] [opt] Add ti.loop_unique(covers=...) to improve task dependence analysis (#2163) (by xumingkuan)
- [misc] Add experimental Python 3.9 support (#2157) (by 彭于斌)
- [lang] Expose SNode ID to python (#2162) (by Ye Kuang)
- [cuda] Add argument "gpu_max_reg" to ti.init (#2161) (by Yuanming Hu)
- [opt] [async] Improve full_simplify and optimize_dead_store (#2160) (by xumingkuan)
- [misc] Enable pypi upload for macOS (#2159) (by Ye Kuang)
- [misc] Deleted 3 debug messages in codegen_cc.cpp (#2158) (by Jiasheng Zhang)
- [sparse] Make memory allocator more robust (#2156) (by Yuanming Hu)
- [misc] Fix SNode max_num_elements to use int64 (#2154) (by Ye Kuang)
- [opt] Simplify bit_cast of bit_cast (#2152) (by xumingkuan)
- [async] [bug] Fix missing memory access options in async mode (#2150) (by xumingkuan)
- [type] Fix struct-for block dim on bit_structs (#2151) (by Yuanming Hu)
- [misc] Add a Github Action workflow to trigger on publishing a release (#2149) (by Ye Kuang)
- [type] Fix arm64 flush to zero (#2148) (by Yuanming Hu)
- [type] Update custom data type APIs (#2147) (by Yuanming Hu)
- [type] Support basic custom int/float types on metal (#2145) (by Ye Kuang)
- [type] Local adder structure (#2136) (by Xuanda Yang)