github NVIDIA/numba-cuda v0.21.0

9 hours ago

What's Changed

  • [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes by @atmnp in #478
  • [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes by @atmnp in #466
  • [test] Use numpy's tolerance for float16 by @ashermancinelli in #491
  • Change dangling imports of numba.core.lowering to numba.cuda.lowering by @VijayKandiah in #475
  • [test] Remove dependency on cpu_target by @ashermancinelli in #490
  • Vendor in optional, boxing for CUDA-specific changes, fix dangling imports by @VijayKandiah in #476
  • [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes by @VijayKandiah in #457
  • Vendor in core.registry for CUDA-specific changes by @VijayKandiah in #485
  • Add compile_all API by @isVoid in #484
  • Improve debug value range coverage by @jiel-nv in #461
  • [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 by @rparolin in #488
  • [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization by @VijayKandiah in #373
  • Vendor in components from numba.core.runtime for CUDA-specific changes by @VijayKandiah in #498
  • Fix Bf16 Test OB Error by @isVoid in #509
  • [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes by @atmnp in #433
  • Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched by @caugonnet in #437
  • Remove C extension loading hacks by @gmarkall in #506
  • Make the CUDA target the default for CUDA overload decorators by @gmarkall in #511
  • [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes by @atmnp in #494
  • [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes by @atmnp in #493
  • Vendor in typeconv for future CUDA-specific changes by @VijayKandiah in #499
  • Vendor in _helperlib cext for CUDA-specific changes by @VijayKandiah in #512
  • Remove some unnecessary uses of ContextResettingTestCase by @gmarkall in #507
  • Updating .gitignore with binaries in the testing folder by @rparolin in #516
  • Switch back to stable cuDF release in thirdparty tests by @brandon-b-miller in #518
  • Don't use MemoryLeakMixin for tests that don't use NRT by @gmarkall in #519
  • Vendor the imputils module for CUDA refactoring by @ashermancinelli in #448
  • chore(dev-deps): add pixi by @cpcloud in #505
  • build: allow parallelization of nvcc testing builds by @cpcloud in #521
  • Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes by @VijayKandiah in #502
  • [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback by @rparolin in #479
  • Use numba.core.config when available, otherwise use numba.cuda.core.config by @VijayKandiah in #497
  • Vendor in numba.core.typing for CUDA-specific changes by @VijayKandiah in #473
  • pyproject.toml: add search path for Pyrefly by @gmarkall in #524
  • Draft: Vendor in the IR module by @ashermancinelli in #439
  • refactor: fully remove USE_NV_BINDING by @cpcloud in #525
  • test(pixi): update pixi testing command to work with the new testing directory by @cpcloud in #522
  • test: add benchmarks for kernel launch for reproducibility by @cpcloud in #528
  • [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes by @atmnp in #513
  • refactor: remove unnecessary custom map and set implementations by @cpcloud in #530
  • perf: reduce the number of __cuda_array_interface__ accesses by @cpcloud in #538
  • perf: remove context threading in various pointer abstractions by @cpcloud in #536
  • perf: speed up kernel launch by @cpcloud in #510
  • test: speed up ipc tests by ~6.5x by @cpcloud in #527
  • chore(perf): add torch to benchmark by @cpcloud in #539
  • perf: remove duplicated size computation by @cpcloud in #537
  • perf: cache dimension computations by @cpcloud in #542
  • bench: add cupy to array constructor kernel launch benchmarks by @cpcloud in #547
  • refactor: cleanup device constructor by @cpcloud in #548
  • Vendor in types and datamodel for CUDA-specific changes by @VijayKandiah in #533
  • feat: add support for math.exp2 by @kaeun97 in #541
  • Handle cuda.core.Stream in driver operations by @brandon-b-miller in #401
  • ci: add timeout to avoid blocking the job queue by @cpcloud in #556
  • [WIP] Port numpy reduction tests to CUDA by @brandon-b-miller in #523
  • Relax the pinning to cuda-core to allow it floating across minor releases by @isVoid in #559
  • Remove dependencies on target_extension for CUDA target by @VijayKandiah in #555
  • [Refactor][NFC] Vendor-in errors for future CUDA-specific changes by @atmnp in #534
  • chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments by @cpcloud in #551
  • test: revert back to ipc futures that await each iteration by @cpcloud in #564
  • test: refactor process-based tests to use concurrent futures in order to simplify tests by @cpcloud in #550
  • [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules by @atmnp in #561
  • Fix registration with Numba, vendor MakeFunctionToJITFunction tests by @gmarkall in #566
  • [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes by @atmnp in #565
  • test: enable fail-on-warn and clean up resulting failures by @cpcloud in #529
  • Propose Alternative Module Path for ext_types and Maintain numba.cuda.types.bfloat16 Import API by @isVoid in #569
  • [CI] Run PR workflow on merge to main by @gmarkall in #572
  • ci: replace conda testing with pixi by @cpcloud in #554
  • feat: add math.nextafter by @kaeun97 in #543
  • Fix checks on main by @gmarkall in #576
  • Fix the cuda.is_supported_version() API by @gmarkall in #571
  • ci: ensure that python version in ci matches matrix by @cpcloud in #575
  • chore(pixi): set up doc builds; remove most build-conda dependencies by @cpcloud in #574
  • ci: move pre-commit checks to pre commit action by @cpcloud in #577
  • Generalize the concurrency group for main merges by @cryos in #582
  • Move frontend tests to cudapy namespace by @brandon-b-miller in #558
  • refactor: replace device functionality with cuda.core APIs by @cpcloud in #581
  • Update tests to accept variants of generated PTX by @gmarkall in #585
  • Fix freezing in of constant arrays with negative strides by @brandon-b-miller in #589
  • refactor: decouple Context from Stream and Event objects by @cpcloud in #579
  • chore(docs): format types docs by @kaeun97 in #596
  • chore: clean up dead workaround for unavailable lru_cache by @cpcloud in #598
  • Add DWARF variant part support for polymorphic variables in CUDA debug info by @jiel-nv in #544
  • Add sphinx-lint to pre-commit and fix errors by @gmarkall in #597
  • Add more thirdparty tests by @gmarkall in #586
  • feat: add support for cache-hinted load and store operations by @kaeun97 in #587
  • Bump version to 0.21.0 by @gmarkall in #602

New Contributors

Acknowledgments

Many thanks to external contributor @kaeun97, who provided support for math.nextafter, math.exp2, cache-hinted load and store operations, and documentation fixes!

Full Changelog: v0.20.0...v0.21.0

Don't miss a new numba-cuda release

NewReleases is sending notifications on new releases.