halide/Halide v18.0.0 on GitHub

Changes Of Note since Halide 17

Ring-buffering now supported in schedules (Func::ring_buffer()). This is distinct from fold_storage in that it folds across time (the loop variables) rather than folding across space (the pure vars of the Func).
Fixed a longstanding bug in lossless_cast()
Lots of fixes for Vulkan backend
OpenGLCompute is no longer supported
Added support for ARM SVE2
Added (basic) support for Intel APX and AVX10
Added support for Hexagon HVX v68
Added support for numpy's .npy format to .debug_to_file() and the code in halide_image_io.h
Python bindings now support bfloat and int64 properly
Hacky code that auto-named Funcs, Vars etc via DWARF introspection was removed
The profiler was revamped to behave better when multiple Halide pipelines are in flight at the same time.
Numerous lowering passes were sped up, resulting in faster compilation for large pipelines. However, time spent in LLVM is still the long pole for most pipelines.
Fixed-point instruction selection has been improved via tracking constant integer bounds of expressions.
Adds feature detection for ARM CPUs to the runtime library and to the host target feature computation. Supports Windows, macOS,
Linux, iOS, and Android.

Deprecations / Removals

tuple_select() has been removed in favor of overloads to select().
Various fixed-point operators have been removed from the Halide::Internal namespace and are now in the public Halide namespace.

What's Changed

Detect ARM CPU features for host target and in runtime (#8298)
Scheduling directive to support ring buffering by @vksnk in #7967
Don't add ring_buffer semaphores if the function is not scheduled as async by @vksnk in #8015
Quick fix for crash that is occurring in SVE2 tests. by @zvookin in #8020
Don't use variable-length arrays by @steven-johnson in #8021
Set warnings on tests as well as src by @steven-johnson in #8022
Stronger chain detection in LoopCarry pass by @vksnk in #8016
adds mappings for f16 variants of halide float math by @mikewoodworth in #8029
Require LLVM >= 16.0 by @steven-johnson in #8003
Add test for #8029 by @steven-johnson in #8032
Tweak the Printer code in runtime for smaller code by @steven-johnson in #8023
Fix bounds_of_nested_lanes by @abadams in #8039
Track whether or not let expressions failed to solve in solver by @abadams in #7982
Fix type error in VectorizeLoops by @abadams in #8055
Update makefile to use test/common/terminate_handler.cpp by @abadams in #8066
add unsafe_promise_clamped by @wraith1995 in #8071
Don't require Halide_WebGPU when using wasm (#8063) by @steven-johnson in #8065
Outsmart the LLVM optimizer by @steven-johnson in #8073
Add hexagon_benchmarks app for CMake builds by @prasmish in #8069
Fix bool conversion bug in Vulkan code generator by @derek-gerstmann in #8067
Better validation of gpu schedules by @abadams in #8068
Add an easy way to print vectors in debug output. by @zvookin in #8072
[WebGPU] Update to latest native headers by @jrprice in #8081
Remove OpenGLCompute by @steven-johnson in #8077
Add checks to prevent people from using negative split factors by @abadams in #8076
Fix rfactor adding too many pure loops by @abadams in #8086
Forward the partition methods from generator outputs by @abadams in #8090
Parallelize some tests by @abadams in #8078
Allow disabling of mutlithreading in simd op check by @steven-johnson in #8096
clang does not support _Float16 when targeting i386 by @LebedevRI in #8085
tests: correctness/float16_t: mark __extendhfsf2 with default visibility by @LebedevRI in #8084
Fix reduce_expr_modulo of vector in Solve.cpp by @abadams in #8089
[Vulkan] Region allocator fixes for memory requirements and allocations by @derek-gerstmann in #8087
Ensure string(REPLACE) is called with the right number of arguments by @alexreinking in #8097
Strip asserts right at the end of lowering by @abadams in #8094
Fix clang-tidy error in runtime.printer.h (parameter shadows member) by @steven-johnson in #8074
Fix an issue where the Halide compiler hits an internal error for bool types in widening intrinsics. by @zvookin in #8099
Small Tutorial Fix by @2022tgoel in #8111
Optionally print the time taken by each lowering pass by @abadams in #8116
Do less redundant work in UnpackBuffers by @abadams in #8104
Avoid redundant scope lookups by @abadams in #8103
Add Intel APX and AVX10 target flags and LLVM attribute setting. by @zvookin in #8052
Use a caching version of stmt_uses_vars in TightenProducerConsumer nodes by @abadams in #8102
Fix hoist_storage not handling condition correctly. by @abadams in #8123
Rewrite the skip stages lowering pass by @abadams in #8115
Remove two dead vars from the Makefile by @abadams in #8125
Add support for setting the default allocator and deallocator functions in Halide::Runtime::Buffer. by @mcourteaux in #8132
Make realization order invariant to unique_name suffixes by @abadams in #8124
Make gpu thread and block for loop names opaque by @abadams in #8133
Add class template type deduction guides to avoid CTAD warning. by @zvookin in #8135
[vulkan] Add conform API methods to memory allocator to fix block allocations by @derek-gerstmann in #8130
Add sobel in hexagon benchmarks app for CMake builds by @prasmish in #8127
Handle loads of broadcasts in FlattenNestedRamps by @abadams in #8139
Use python itself to get the extension suffix, not python-config by @abadams in #8148
Rewrite the pass that adds mutexes for atomic nodes by @abadams in #8105
Feature: mark a Func as no_profiling, to prevent injection of profiling. (2nd implementation) by @mcourteaux in #8143
Bound allocation extents for hoist_storage using loop variables one-by-one by @vksnk in #8154
Support for ARM SVE2. by @zvookin in #8051
Fix two compute_with bugs. by @abadams in #8152
Python bindings: add_python_test(): do set HL_JIT_TARGET too by @LebedevRI in #8156
fix ub in lower rounding shift right by @abadams in #8173
Add some missing _Float16 support by @steven-johnson in #8174
Add conversion code for Float16 that was missed in #8174 by @steven-johnson in #8178
Tighten bounds of abs() by @rootjalex in #8168
Clarify the meaning of Shuffle::is_broadcast() by @abadams in #8158
Add .npy support to halide_image_io by @steven-johnson in #8175
Update Hexagon Install Instructions by @FabianSchuetze in #8182
Add .npy support to debug_to_file() by @steven-johnson in #8177
Don't print on parallel task entry/exit with -debug flag by @abadams in #8185
Fix corner case in if_then_else simplification by @abadams in #8189
Rewrite IREquality to use a more compact stack instead of deep recursion by @abadams in #8198
[HEXAGON] Keep support for hexagon_remote/Makefile by @aankit-quic in #8186
Faster substitute_facts by @abadams in #8200
Make Interval::is_single_point check for deep equality by @abadams in #8202
Refactor ConstantInterval by @abadams in #8179
Faster vars used tracking in simplify let visitor by @abadams in #8205
More aggressively unify duplicate lets by @abadams in #8204
Update debug_to_file API to remove type_code by @steven-johnson in #8183
[x86 & HVX & WASM] Use bounds inference for saturating_narrow instruction selection by @rootjalex in #7805
Insert apparently-missing break; in IREquality.cpp by @steven-johnson in #8211
Fix Reinterpret cmp in IREquality by @rootjalex in #8217
Fix give-up case in ModulusRemainder by @abadams in #8221
Fix for top-of-tree LLVM by @steven-johnson in #8223
Add some EVAL_IN_LAMBDAs to Simplify_Sub.cpp by @abadams in #8230
Fix saturating add matching in associativity checking by @abadams in #8220
Add HVX_v68 target to support Hexagon HVX v68. by @wangcheng22 in #8232
Mark host_dirty() and device_dirty() with no_discard. by @mcourteaux in #8248
Rework the simplifier to use ConstantInterval for bounds by @abadams in #8222
Remove max size assert from Anderson2021 by @jansel in #8253
Expose BFloat in Python bindings by @jansel in #8255
Fix Metal handling for float16 literals by @shoaibkamil in #8260
Python binding support for int64 literals by @jansel in #8254
Report useful error to user if the promise_clamp all fails to losslessly cast. by @mcourteaux in #8238
It's generally a bad idea for simplifier rules to multiply constants by @abadams in #8234
[vulkan] Fix Vulkan SIMT mappings for GPU loop vars. by @derek-gerstmann in #8259
Stop region costs from complaining about new intrinsics by @abadams in #8262
No longer silently hide errors in Metal completion handlers (alternative approach) by @shoaibkamil in #8240
Use upstream interface for consuming SPIR-V by @alexreinking in #8265
Fix OpenCL positive and negative INF constants. by @alexreinking in #8266
scoped_truth for the loop variable being always less than the loop extent. by @mcourteaux in #8306
Fix incorrect type in emulation of float16 is_inf/nan by @abadams in #8310
Don't try to codegen predicated atomic stores by @abadams in #8285
Add ability to pass explicit RDom to Function::define_update by @abadams in #8284
[vulkan] Dynamically load Vulkan loader library. Avoid Validation Layer crash on exit. by @derek-gerstmann in #8289
Remove Introspection by @steven-johnson in #8273
Per-pipeline-invocation profiling by @abadams in #8153
Fix device slices for Buffer with fixed dimensionality in template. by @mcourteaux in #8313
Remove deprecated operators by @steven-johnson in #8321
Provide a minimum OS version for MachO objects by @alexreinking in #8323
Fix horrifying bug in lossless_cast of a subtract by @abadams in #8155

New Contributors

@tylerhou made their first contribution in #8013
@wraith1995 made their first contribution in #8071
@prasmish made their first contribution in #8069
@2022tgoel made their first contribution in #8111
@FabianSchuetze made their first contribution in #8182
@FindHao made their first contribution in #8322

Full Changelog: v17.0.2...v18.0.0

halide/Halide v18.0.0 Halide v18.0.0 on GitHub

Changes Of Note since Halide 17

Deprecations / Removals

What's Changed

New Contributors

halide/Halide v18.0.0
Halide v18.0.0

on GitHub