Major improvements
- Halide is now available for both C++ and Python usage via Pip. Try
pip install halide
today! - The Vulkan backend has matured substantially.
- The HTML "conceptual statement" output now supports dark mode viewing.
- For developers, CMake 3.28 is now required and we no longer require an internet connection during the build.
- Thread pool improvements mean that workloads that do a small number of small tasks in parallel (e.g. a cheap operation applied to a small image) are up to 3x faster. If you have schedules that do not use parallelism for small inputs because you found it didn't provide any speedup, you may want to re-benchmark.
- You can now query properties of the compiled-for target as Exprs, simplifying helper code that wants to do different things depending on the target architecture. Example:
f(x) = select(target_arch_is(Target::ARM), 3, 7)
. Helpers includetarget_arch_is
,target_os_is
,target_has_feature
,target_bits
, andtarget_natural_vector_size
. These are resolved to constants at compile-time and simplified away. Use with care, as this (intentionally) results in different behavior on different platforms.
Breaking changes
- We now distribute
libGenGen.a
rather thanGenGen.cpp
.- Downstream users should link to this library with
/WHOLEARCHIVE:
or-Wl,--whole-archive
rather than buildGenGen.cpp
themselves. - Users of the CMake package should be unaffected.
- Downstream users should link to this library with
- In keeping with our LLVM support policy, support for LLVM 16 has been removed.
- We no longer use the
le64
/le32
generic targets for compiling runtime modules to LLVM. These targets were removed in LLVM upstream.
What's Changed
Apps and tests
- Reschedule the matrix multiply performance app by @abadams in #8418
- Update lesson_22_jit_performance.cpp by @abadams in #8438
- Add threadpool performance test by @abadams in #8447
- Don't allow internal_error to pass an error test by @alexreinking in #8458
- Get more consistent distributions in parallel scenarios test by @abadams in #8451
Autoschedulers
Build system
Python_bindings
-test-as-installed by @LebedevRI in #8355- Bump Halide version to 19 in main branch by @steven-johnson in #8357
- Remove warning for unsupported compilers by @alexreinking in #8362
- Bump CMake minimum version to 3.28 by @alexreinking in #8363
- Quick CMake fixes enabled by 3.28 by @alexreinking in #8365
- Distribute GenGen as a static library by @alexreinking in #8367
- Clean up serialization build code by @alexreinking in #8369
- List headers with target_sources FILE_SETS by @alexreinking in #8370
- Clean up autoscheduler dependencies by @alexreinking in #8372
- Use a Find module for V8 by @alexreinking in #8373
- Use a Find module for NodeJS by @alexreinking in #8374
- Move dependencies/wasm to use sites by @alexreinking in #8377
- Replace FetchContent with a custom dependency provider by @alexreinking in #8378
- Two more build fixes by @LebedevRI in #8371
- Rework LLVM into Find module and enact new component policy. by @alexreinking in #8379
- Reflow src/CMakeLists.txt in logical groups by @alexreinking in #8383
- Introduce HalideFeatures system for optional components by @alexreinking in #8384
- Scan generated export files to determine dependencies. by @alexreinking in #8385
- Rewrite bundle_static to be much more efficient. by @alexreinking in #8386
- Support using vcpkg to build dependencies on all platforms by @alexreinking in #8387
- Fix bundling error on buildbots by @alexreinking in #8392
- Support CMAKE_OSX_ARCHITECTURES by @alexreinking in #8390
- Fix Homebrew LLVM 19 by @alexreinking in #8431
- Fix CPack package naming when cross-compiling by @alexreinking in #8492
- Fix Apple libtool detection in bundle_static by @alexreinking in #8495
CodeGen
- Select condition vector lanes must match the true and false value by @abadams in #8465
- Emit
vscale_range()
fn attribute in correct syntax by @steven-johnson in #8457 - Fix #8455 (in combination with #8457) by @steven-johnson in #8456
- Fix bonehead mistake in get_md_bool() by @steven-johnson in #8469
- Propagate some facts about inequalities with min/max by @shoaibkamil in #8475
- This fixed an issue where predicates in
.specialize()
directives weren't able to eliminateselect()
cases. #8443
- This fixed an issue where predicates in
Debugging
- Add LLDB pretty-printing by @alexreinking in #8460
- Print constants in scientific precision by @antonysigma in #8506
- Adaptive Dark colorscheme for Stmt HTML. Ability to programmatically export conceptual stmt files. by @mcourteaux in #8327
Documentation
- Update README.md by @abadams in #8404
- Big documentation update by @alexreinking in #8410
- Document how to find Halide from a pip installation by @alexreinking in #8411
- Link to PyPI from Doxygen index.html by @alexreinking in #8415
- Include our Markdown documentation in the Doxygen site. by @alexreinking in #8417
- Add missing backslash by @abadams in #8419
Frontend
- Don't let users disguise RVars as Vars by @abadams in #8441
- Add helper functions to query properties of the lowered Target (#8192) by @steven-johnson in #8359
Hardware backends
- Fix injection of GPU buffers that do not go by a Func name (i.e. alloc groups). by @mcourteaux in #8333
- Remove vestigial AMDGPU backend by @alexreinking in #8382
- Add ARMv8.x feature flags by @steven-johnson in #4489
- [vulkan] Fixes to address outstanding validation failures by @derek-gerstmann in #8448
- [vulkan] Reduce descriptor sets, use official headers, improve allocator, remove module destructor by @derek-gerstmann in #8452
- [vulkan] Skip
async_copy_chain
andgpu_allocation_cache
correctness tests on Windows by @derek-gerstmann in #8503
LLVM
- Don't use le32/le64 by @steven-johnson in #8344
- Fix for the removed DataLayout constructor. by @mcourteaux in #8391
- Drop support for LLVM 16 in main by @steven-johnson in #8358
- Allow LLVM 20 by @steven-johnson in #8352
- Fix for top-of-tree LLVM by @steven-johnson in #8421
- Fix for top-of-tree LLVM by @steven-johnson in #8425
- Fix for top-of-tree LLVM by @steven-johnson in #8442
- Fix datalayout for osx-arm-64 by @abadams in #8449
- Fix top of LLVM. by @mcourteaux in #8454
- Replace all use of getPointerTo() with PointerType::get() by @steven-johnson in #8473
Python
- Fix Numpy 2.0 compatibility bug in lesson 10 by @alexreinking in #8381
- Pip packaging at last! by @alexreinking in #8405
- Update pip package metadata by @alexreinking in #8412
- Fix classifier spelling by @alexreinking in #8413
- Upgrade LLVM to 19.1.0 in pip package by @alexreinking in #8423
- Update PIP LLVM to 19.1.4 by @alexreinking in #8488
- PythonExtensionGen: ~PyHalideBuffer should call device_free() (#8399) by @steven-johnson in #8439
Runtime
- Fix profiler to report time spent on GPU kernels again instead of on 'wait for parallel tasks'. by @mcourteaux in #8453
- Don't spin on the main mutex while waiting for new work by @abadams in #8433
Minor bugfixes / other cleanup
- Remove remaining dregs of tuple_select (oops) by @steven-johnson in #8329
- Fix incorrect output in Python tutorial, lesson 5 by @qqaatw in #8331
- Make pybind11 minimum version check compatible with pybind11 v3. by @rwgk in #8366
- Partially apply clang-tidy fixes we don't enforce yet by @abadams in #8376
- Fix incorrect std::array sizes in Target.cpp by @steven-johnson in #8396
- Fix _Float16 detection on ARM64 GCC<13 by @alexreinking in #8401
- Make run-clang-tidy.sh work on macOS by @alexreinking in #8416
- Some minor fixes for C++23 compilation errors. by @zvookin in #8422
- Fix two warnings on GCC 14.2.1. by @mcourteaux in #8430
- Fix typos by @alexreinking in #8459
- Fix two trivial build errors by @steven-johnson in #8467
- Fix #8470: fuzz_bounds should use select() rather than Select::make() by @steven-johnson in #8471
- Add missing #include by @steven-johnson in #8476
- Fix heap-use-after-free error in find_best_fit() by @steven-johnson in #8483
- Fix typos in comments by @alexreinking in #8485
- Remove unused is_update argument by @alexreinking in #8487
- Backport reverse_view to clean up some code by @alexreinking in #8486
- Remove type inspection helpers from ApplySplitResult and Split by @alexreinking in #8489
- Use std::optional to clean up some code and prevent use-after-free bugs by @abadams in #8484
- Fix comment for Buffer::copy() (Fixes #8498) by @steven-johnson in #8500
- Make the default constructor for ConstantInterval inlinable by @abadams in #8505
- Remove two unused functions by @steven-johnson in #8501
- Move some large stack frames off recursive paths. by @abadams in #8507
New Contributors
Full Changelog: v18.0.0...v19.0.0