NVIDIA/numba-cuda v0.21.0 on GitHub

What's Changed

[Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes by @atmnp in #478
[Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes by @atmnp in #466
[test] Use numpy's tolerance for float16 by @ashermancinelli in #491
Change dangling imports of numba.core.lowering to numba.cuda.lowering by @VijayKandiah in #475
[test] Remove dependency on cpu_target by @ashermancinelli in #490
Vendor in optional, boxing for CUDA-specific changes, fix dangling imports by @VijayKandiah in #476
[Refactor][NFC] Vendor in numba.misc for CUDA-specific changes by @VijayKandiah in #457
Vendor in core.registry for CUDA-specific changes by @VijayKandiah in #485
Add compile_all API by @isVoid in #484
Improve debug value range coverage by @jiel-nv in #461
[MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 by @rparolin in #488
[Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization by @VijayKandiah in #373
Vendor in components from numba.core.runtime for CUDA-specific changes by @VijayKandiah in #498
Fix Bf16 Test OB Error by @isVoid in #509
[Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes by @atmnp in #433
Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched by @caugonnet in #437
Remove C extension loading hacks by @gmarkall in #506
Make the CUDA target the default for CUDA overload decorators by @gmarkall in #511
[Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes by @atmnp in #494
[Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes by @atmnp in #493
Vendor in typeconv for future CUDA-specific changes by @VijayKandiah in #499
Vendor in _helperlib cext for CUDA-specific changes by @VijayKandiah in #512
Remove some unnecessary uses of ContextResettingTestCase by @gmarkall in #507
Updating .gitignore with binaries in the testing folder by @rparolin in #516
Switch back to stable cuDF release in thirdparty tests by @brandon-b-miller in #518
Don't use MemoryLeakMixin for tests that don't use NRT by @gmarkall in #519
Vendor the imputils module for CUDA refactoring by @ashermancinelli in #448
chore(dev-deps): add pixi by @cpcloud in #505
build: allow parallelization of nvcc testing builds by @cpcloud in #521
Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes by @VijayKandiah in #502
[MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback by @rparolin in #479
Use numba.core.config when available, otherwise use numba.cuda.core.config by @VijayKandiah in #497
Vendor in numba.core.typing for CUDA-specific changes by @VijayKandiah in #473
pyproject.toml: add search path for Pyrefly by @gmarkall in #524
Draft: Vendor in the IR module by @ashermancinelli in #439
refactor: fully remove USE_NV_BINDING by @cpcloud in #525
test(pixi): update pixi testing command to work with the new testing directory by @cpcloud in #522
test: add benchmarks for kernel launch for reproducibility by @cpcloud in #528
[Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes by @atmnp in #513
refactor: remove unnecessary custom map and set implementations by @cpcloud in #530
perf: reduce the number of __cuda_array_interface__ accesses by @cpcloud in #538
perf: remove context threading in various pointer abstractions by @cpcloud in #536
perf: speed up kernel launch by @cpcloud in #510
test: speed up ipc tests by ~6.5x by @cpcloud in #527
chore(perf): add torch to benchmark by @cpcloud in #539
perf: remove duplicated size computation by @cpcloud in #537
perf: cache dimension computations by @cpcloud in #542
bench: add cupy to array constructor kernel launch benchmarks by @cpcloud in #547
refactor: cleanup device constructor by @cpcloud in #548
Vendor in types and datamodel for CUDA-specific changes by @VijayKandiah in #533
feat: add support for math.exp2 by @kaeun97 in #541
Handle cuda.core.Stream in driver operations by @brandon-b-miller in #401
ci: add timeout to avoid blocking the job queue by @cpcloud in #556
[WIP] Port numpy reduction tests to CUDA by @brandon-b-miller in #523
Relax the pinning to cuda-core to allow it floating across minor releases by @isVoid in #559
Remove dependencies on target_extension for CUDA target by @VijayKandiah in #555
[Refactor][NFC] Vendor-in errors for future CUDA-specific changes by @atmnp in #534
chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments by @cpcloud in #551
test: revert back to ipc futures that await each iteration by @cpcloud in #564
test: refactor process-based tests to use concurrent futures in order to simplify tests by @cpcloud in #550
[Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules by @atmnp in #561
Fix registration with Numba, vendor MakeFunctionToJITFunction tests by @gmarkall in #566
[Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes by @atmnp in #565
test: enable fail-on-warn and clean up resulting failures by @cpcloud in #529
Propose Alternative Module Path for ext_types and Maintain numba.cuda.types.bfloat16 Import API by @isVoid in #569
[CI] Run PR workflow on merge to main by @gmarkall in #572
ci: replace conda testing with pixi by @cpcloud in #554
feat: add math.nextafter by @kaeun97 in #543
Fix checks on main by @gmarkall in #576
Fix the cuda.is_supported_version() API by @gmarkall in #571
ci: ensure that python version in ci matches matrix by @cpcloud in #575
chore(pixi): set up doc builds; remove most build-conda dependencies by @cpcloud in #574
ci: move pre-commit checks to pre commit action by @cpcloud in #577
Generalize the concurrency group for main merges by @cryos in #582
Move frontend tests to cudapy namespace by @brandon-b-miller in #558
refactor: replace device functionality with cuda.core APIs by @cpcloud in #581
Update tests to accept variants of generated PTX by @gmarkall in #585
Fix freezing in of constant arrays with negative strides by @brandon-b-miller in #589
refactor: decouple Context from Stream and Event objects by @cpcloud in #579
chore(docs): format types docs by @kaeun97 in #596
chore: clean up dead workaround for unavailable lru_cache by @cpcloud in #598
Add DWARF variant part support for polymorphic variables in CUDA debug info by @jiel-nv in #544
Add sphinx-lint to pre-commit and fix errors by @gmarkall in #597
Add more thirdparty tests by @gmarkall in #586
feat: add support for cache-hinted load and store operations by @kaeun97 in #587
Bump version to 0.21.0 by @gmarkall in #602

New Contributors

@rparolin made their first contribution in #488
@caugonnet made their first contribution in #437
@cpcloud made their first contribution in #505
@kaeun97 made their first contribution in #541

Acknowledgments

Many thanks to external contributor @kaeun97, who provided support for math.nextafter, math.exp2, cache-hinted load and store operations, and documentation fixes!

Full Changelog: v0.20.0...v0.21.0