What's Changed
- Extend dbg.value coverage to loadvar for scalar kernel parameters by @jiel-nv in #813
- Fix FP8 uint64 cast flake on Windows by @cpcloud in #829
- Use dbg.declare for scalar kernel parameters by @cpcloud in #828
- Fix mixed-IR liveness for inline overload DCE by @cpcloud in #795
- Use
cuda-pythonfornvvmbindings by @brandon-b-miller in #818 - fix(ci): cudaRoundMode typing failure in FP8 test by @kaeun97 in #834
- Support cuda_bindings FastEnum by @mdboom in #837
- Support cuda.core.GraphBuilder as a kernel-launch stream by @Andy-Jost in #836
- fix: normalize numpy integer types to python int to prevent overflow errors by @kaeun97 in #774
- Bump Version to 0.29.0 by @isVoid in #838
New Contributors
- @Andy-Jost made their first contribution in #836
Full Changelog: v0.28.2...v0.29.0