github halide/Halide v21.0.0

6 hours ago

Release highlights

We have deliberately skipped version 20.0.0 to align with the LLVM version we are now using. Note that LLVM 21.1.1 or higher is required as LLVM 21.1.0 has a major bug in the NVPTX backend.

Major changes

  • The rfactor scheduling directive was rewritten and enhanced. It is now compatible with autoschedulers.
  • The Mullapudi2016 autoscheduler now supports experimental GPU scheduling.
  • The Python bindings have been substantially improved, with many missing bindings filled in.
  • HL_DEBUG_CODEGEN gained a new filtering mode. Debug levels can now be set on a per-file/per-function basis.
  • Support was added for AMD Zen5 and the iOS Simulator.
  • The strict_float feature has been reimplemented and should be much more reliable.
  • Lots of bugfixes, performance improvements, and build system improvements. We spent a lot of time fixing issues with our testing infrastructure and are looking forward to implementing a more stable contribution experience going forward.

Deprecations

  • LLVM 19 and below are no longer supported, in keeping with our support policy.
  • Halide_BUNDLE_STATIC will be removed in the next release. If you are using it, please migrate to the shared library instead.
  • Support for Python 3.8 has been dropped.

Changelog

Scheduling

  • The rfactor scheduling directive was rewritten and enhanced.
  • The Mullapudi2016 autoscheduler now supports experimental GPU scheduling.
    • GPU autoscheduling with Mullapudi2016: the reference implementation by @antonysigma in #7787
    • Mullapudi2016-GPU: Reorder to avoid for-loops to be sandwiched between gpu_blocks. by @antonysigma in #8647
    • Enable experimental Mullapudi2016 GPU scheduler for test-bench by @antonysigma in #8650
    • Highlight Metal GPU code in stmt_html by @antonysigma in #8659
    • Always ensure gpu_threads count >= warp size of 32 by @antonysigma in #8656
  • Fix incorrect natural vector size on Zen4 by @abadams in #8570
  • Make it an error to use a device extern stage without target support by @abadams in #8794
  • Add support for adding tuple outputs in the configure() method by @abadams in #8649

Python

Debugging

CodeGen

  • Mark our PTX kernels as kernels, to stop them from being stripped by @abadams in #8571
  • Math functions renaming table for GPU backends to support vectorized evaluation of math functions. by @mcourteaux in #8595
  • Apply version constraints to iOS objects by @alexreinking in #8546
  • Redirect bitwise ops to logical ops in case the arguments are bool. by @mcourteaux in #8597
  • scalarize select condition for LLVM where possible by @abadams in #8575
  • Add missing addition simplifier rules by @abadams in #8630
  • Bounds and alignment analysis through bitwise ops by @abadams in #8574
  • Make the vld2 pattern more obviously profitable by @abadams in #8765
  • Fix vector shuffle for Vulkan CodeGen by @derek-gerstmann in #8621
  • Suppress warning on Windows for duplicate constant symbols. by @mcourteaux in #8555
  • Use lossless_cast for saturating casts from unsigned to signed on x86 by @abadams in #8527
  • AMD Zen5 support by @changhoon-sung in #8612

Compiler

  • Rework strict_float to use individual op intrinsics instead by @abadams in #8641
  • Don't cache mutations of Exprs that have only one reference to them by @abadams in #8518
  • Only use the nodes-visited set for nodes with multiple refs by @abadams in #8547
  • In graph_equal(), call the correct implementation for comparing equalities between statements and expressions by @BachiLi in #8611

Runtime

  • Support copying the overlapping region from one buffer to another. by @mcourteaux in #8463
  • Add (iOS) simulator target feature. by @alexreinking in #8623
  • Opt out of JIT exceptions by @abadams in #8615
  • Experimental: support removing unused runtime functions via
    HL_RUNTIME_DROP_FUNCS environment variable.

Apps

Documentation

Bugfixes

Testing / CI

Build

Ongoing maintenance

New Contributors

Full Changelog: v19.0.0...v21.0.0

Don't miss a new Halide release

NewReleases is sending notifications on new releases.