What's Changed
- [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes by @atmnp in #478
- [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes by @atmnp in #466
- [test] Use numpy's tolerance for float16 by @ashermancinelli in #491
- Change dangling imports of numba.core.lowering to numba.cuda.lowering by @VijayKandiah in #475
- [test] Remove dependency on cpu_target by @ashermancinelli in #490
- Vendor in optional, boxing for CUDA-specific changes, fix dangling imports by @VijayKandiah in #476
- [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes by @VijayKandiah in #457
- Vendor in core.registry for CUDA-specific changes by @VijayKandiah in #485
- Add
compile_allAPI by @isVoid in #484 - Improve debug value range coverage by @jiel-nv in #461
- [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 by @rparolin in #488
- [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization by @VijayKandiah in #373
- Vendor in components from numba.core.runtime for CUDA-specific changes by @VijayKandiah in #498
- Fix Bf16 Test OB Error by @isVoid in #509
- [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes by @atmnp in #433
- Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched by @caugonnet in #437
- Remove C extension loading hacks by @gmarkall in #506
- Make the CUDA target the default for CUDA overload decorators by @gmarkall in #511
- [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes by @atmnp in #494
- [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes by @atmnp in #493
- Vendor in typeconv for future CUDA-specific changes by @VijayKandiah in #499
- Vendor in _helperlib cext for CUDA-specific changes by @VijayKandiah in #512
- Remove some unnecessary uses of ContextResettingTestCase by @gmarkall in #507
- Updating .gitignore with binaries in the
testingfolder by @rparolin in #516 - Switch back to stable cuDF release in thirdparty tests by @brandon-b-miller in #518
- Don't use
MemoryLeakMixinfor tests that don't use NRT by @gmarkall in #519 - Vendor the imputils module for CUDA refactoring by @ashermancinelli in #448
- chore(dev-deps): add pixi by @cpcloud in #505
- build: allow parallelization of nvcc testing builds by @cpcloud in #521
- Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes by @VijayKandiah in #502
- [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback by @rparolin in #479
- Use numba.core.config when available, otherwise use numba.cuda.core.config by @VijayKandiah in #497
- Vendor in numba.core.typing for CUDA-specific changes by @VijayKandiah in #473
- pyproject.toml: add search path for Pyrefly by @gmarkall in #524
- Draft: Vendor in the IR module by @ashermancinelli in #439
- refactor: fully remove
USE_NV_BINDINGby @cpcloud in #525 - test(pixi): update pixi testing command to work with the new
testingdirectory by @cpcloud in #522 - test: add benchmarks for kernel launch for reproducibility by @cpcloud in #528
- [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes by @atmnp in #513
- refactor: remove unnecessary custom map and set implementations by @cpcloud in #530
- perf: reduce the number of
__cuda_array_interface__accesses by @cpcloud in #538 - perf: remove context threading in various pointer abstractions by @cpcloud in #536
- perf: speed up kernel launch by @cpcloud in #510
- test: speed up ipc tests by ~6.5x by @cpcloud in #527
- chore(perf): add torch to benchmark by @cpcloud in #539
- perf: remove duplicated size computation by @cpcloud in #537
- perf: cache dimension computations by @cpcloud in #542
- bench: add cupy to array constructor kernel launch benchmarks by @cpcloud in #547
- refactor: cleanup device constructor by @cpcloud in #548
- Vendor in types and datamodel for CUDA-specific changes by @VijayKandiah in #533
- feat: add support for
math.exp2by @kaeun97 in #541 - Handle
cuda.core.Streamin driver operations by @brandon-b-miller in #401 - ci: add timeout to avoid blocking the job queue by @cpcloud in #556
- [WIP] Port numpy reduction tests to CUDA by @brandon-b-miller in #523
- Relax the pinning to
cuda-coreto allow it floating across minor releases by @isVoid in #559 - Remove dependencies on target_extension for CUDA target by @VijayKandiah in #555
- [Refactor][NFC] Vendor-in errors for future CUDA-specific changes by @atmnp in #534
- chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments by @cpcloud in #551
- test: revert back to ipc futures that await each iteration by @cpcloud in #564
- test: refactor process-based tests to use concurrent futures in order to simplify tests by @cpcloud in #550
- [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules by @atmnp in #561
- Fix registration with Numba, vendor MakeFunctionToJITFunction tests by @gmarkall in #566
- [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes by @atmnp in #565
- test: enable fail-on-warn and clean up resulting failures by @cpcloud in #529
- Propose Alternative Module Path for
ext_typesand Maintainnumba.cuda.types.bfloat16Import API by @isVoid in #569 - [CI] Run PR workflow on merge to main by @gmarkall in #572
- ci: replace conda testing with pixi by @cpcloud in #554
- feat: add
math.nextafterby @kaeun97 in #543 - Fix checks on main by @gmarkall in #576
- Fix the
cuda.is_supported_version()API by @gmarkall in #571 - ci: ensure that python version in ci matches matrix by @cpcloud in #575
- chore(pixi): set up doc builds; remove most
build-condadependencies by @cpcloud in #574 - ci: move pre-commit checks to pre commit action by @cpcloud in #577
- Generalize the concurrency group for main merges by @cryos in #582
- Move frontend tests to
cudapynamespace by @brandon-b-miller in #558 - refactor: replace device functionality with
cuda.coreAPIs by @cpcloud in #581 - Update tests to accept variants of generated PTX by @gmarkall in #585
- Fix freezing in of constant arrays with negative strides by @brandon-b-miller in #589
- refactor: decouple
ContextfromStreamandEventobjects by @cpcloud in #579 - chore(docs): format types docs by @kaeun97 in #596
- chore: clean up dead workaround for unavailable
lru_cacheby @cpcloud in #598 - Add DWARF variant part support for polymorphic variables in CUDA debug info by @jiel-nv in #544
- Add sphinx-lint to pre-commit and fix errors by @gmarkall in #597
- Add more thirdparty tests by @gmarkall in #586
- feat: add support for cache-hinted load and store operations by @kaeun97 in #587
- Bump version to 0.21.0 by @gmarkall in #602
New Contributors
- @rparolin made their first contribution in #488
- @caugonnet made their first contribution in #437
- @cpcloud made their first contribution in #505
- @kaeun97 made their first contribution in #541
Acknowledgments
Many thanks to external contributor @kaeun97, who provided support for math.nextafter, math.exp2, cache-hinted load and store operations, and documentation fixes!
Full Changelog: v0.20.0...v0.21.0