Changelog:
- [Type] Support bit-level load and store (#1996) (by Jiafeng Liu)
- [sparse] Fix allocator initialization (#2010) (by Yuanming Hu)
- [async] Improve benchmarks (#2005) (by Yuanming Hu)
- [metal] Revise NodeManager's implementation due to weak memory order (#2008) (by Ye Kuang)
- [OpenGL] [perf] Utilize glDispatchComputeIndirect to prevent sync when dynamic ranges are used (#2007) (by 彭于斌)