What's Changed
Major changes
- @abadams
- @alexreinking
- Add helper for cross-compiling Halide generators. (#6366)
- @LebedevRI
- @steven-johnson
- @zvookin
- Timer based profiler (#6642)
Minor changes
- @abadams
- Deprecate JIT runtime override methods that take void * (#6344)
- Allow users to use their own cuda contexts and streams in JIT mode (#6345)
- Add --help flag to rungenmain, fixing #5323 (#6354)
- Do target-specific lowering of lerp (#6432)
- Reduce overhead of sampling profiler by having only one thread do it (#6433)
- Skip custom cuda context test on older GPUs (#6437)
- Avoid needless gather in fast_integer_divide lowering (#6441)
- Fixes for c++20 (#6446)
- Add a fast integer divide that rounds to zero (#6455)
- Let lerp lowering incorporate a final cast. (#6480)
- Try removing optional buffer added to closure (#6481)
- rounding shift rights should use rounding halving add (#6494)
- Make random faster by putting the innermost var last (#6504)
- Make it possible to interpret a wide type as multiple smaller elements (#6506)
- Handle mixed-width args to mul-shift-right (#6526)
- Attempted redo of faster noise (#6539)
- Better default lowering of absd (#6545)
- Make HALIDE_REGISTER_GENERATOR work with multiple template args (#6556)
- Rename Output to OutputFileType and deprecated Output (#6568)
- Remove incorrect not-multiple-of-16 claim (#6573)
- Fix bug in mul_shift_right matching (#6610)
- @alexreinking
- @ashishUthama
- Include LICENSE.txt in package (#6428)
- @dsharletg
- Fix description of rounding_shift_left/rounding_shift_right (#6549)
- @Elarnon
- Only commutative reductions can be parallelized (#6609)
- @jinderek
- Support new warp shuffle intrinsics after CUDA Volta architecture (#6505)
- @knzivid
- python_bindings: Fix SIGSEGV in HalidePythonCompileTimeErrorReporter (#6635)
- @LebedevRI
- [CMake] Deduplicate
Halide_LLVM_VERSION
andLLVM_PACKAGE_VERSION
(#6646)
- [CMake] Deduplicate
- @masahi
- [APP] Fix
hexagon_benchmarks
build (use two-var prefetch) (#6563)
- [APP] Fix
- @mcleary
- Add support for AMX instructions (#5818)
- @mcourteaux
- @mgharbi
- Fixes the Pytorch Wrapper Codegen for CPU-only machines. (#6590)
- @OmarEmaraDev
- @rootjalex
- Make bounds of let visitor use unique_name() (#6583)
- Remove incorrect docs on widening_add (#6625)
- Disallow
Type::narrow()
andType::widen()
from producing bitwidths between 1 and 8 bits (#6622) - Wild match object should not be foldable (#6623)
- Clear bounds info on casts when value bounds are undefined for overflow types (#6640)
- @slomp
- decommissioning StackPrinter (#6470)
- @steven-johnson
- [hannk] Fix MeanOp (#6336)
- Add
using OpVisitor::visit;
to various OpVisitors to avoid overload warnings for some compilers (#6337) - [hannk] Add a prepare() method for ops and interp (#6338)
- Fix WASM datalayout for top-of-tree LLVM (#6339)
- Make halide_type_t and halide_type_of constexpr (#6340)
- Harvest IWYU changes for LLVM, WABT (#6341)
- Fix HelloWasm (#6342)
- Fix Makefile for LLVM11 (injection from #5818) (#6343)
- [hannk] requantize() should never skip the operation (#6350)
- [hannk] augment SoftmaxOp to allow specifying axis (#6351)
- Use Node instead of d8 for Wasm AOT testing (#6356)
- [hannk] Add missing call to Interpreter::prepare in benchmark app (#6358)
- [hannk] Allow disabling TFLite+Delegate build in CMake (#6360)
- [hannk] Add support for building/running for wasm (#6361)
- Update Emscripten settings (#6362)
- [hannk] Clean up aliasing (v2) (#6364)
- [hannk] tests should only process .tflite files (#6368)
- Revamp Hannk IR (#6379)
- Fix for top-of-tree LLVM (#6380)
- Remove halide_assert() from halide_default_device_wrap_native (#6381)
- Rename halide_assert -> halide_abort_if_false (#6382)
- Convert various halide_assert -> static_assert (#6383)
- Fix for top-of-tree LLVM (#6386)
- Check results of all runtime function calls (#6389)
- Add halide_debug_assert() macro (#6390)
- [hannk] Have CMake emit .s, .stmt, .ll files (#6392)
- [hannk] Upgrade hannk to use TFLite 2.7.0 by default (#6393)
- Clean up CodeGen_LLVM names to match ASAN nomenclature changes (#6395)
- Drop support for LLVM11 (#6396)
- Move PyTorch test into standalone tests (#6397)
- Remove halide_abort_if_false() usage in runtime/metal (#6398)
- Fix OGLC debug builds (#6399)
- Add defensive checks to halide_buffer_copy_already_locked (#6401)
- _halide_buffer_crop() needs to check for runtime failures (v2) (#6403)
- Fix broken ASAN code (#6408)
- [hannk] Pacify clang-tidy (#6412)
- One more ASAN fix (#6413)
- [hannk] Fix lower_tflite_fullyconnected (#6414)
- Fix Introspection issues (#6424)
- Don't remap the function name or the target in the metadata (#6430)
- Set up SANITIZER_FLAGS and OPTIMIZE for apps/Makefile.inc (#6435)
- Ensure that halide_start_clock() is called before halide_current_time… (#6438)
- Codegen_C: buffer compilation needs to special-case scalar buffers (#6442)
- Add operator<< for Closure (#6443)
- Re-enable performance_async_gpu for D3D12Compute (#6450)
- Tweak Hexagon codegen output (#6461)
- Add LinkageType::ExternalPlusArgv (#6452) (#6463)
- Fix Closure API (#6464)
- Move null check from Printer to halide_string_to_string() (#6467)
- Deal with Printer::scratch (#6469) (#6472)
- Restore support for using V8 as the Wasm JIT interpreter (#6478)
- Fail if no_bounds_query specified for HL_JIT_TARGET (#6489)
- Document the usage of llvm::legacy::PassManager (#6491)
- Update WABT to 1.0.25 (#6497)
- Grab Bag of minor cleanups to LowerParallelTasks (#6498)
- Update simd_op_check for arm64 upz1 code generation (#6499) (#6500)
- Fix size_t -> int conversion warning (#6501)
- Fix simd-op-check for top-of-tree LLVM (#6529)
- Revert "Make random faster by putting the innermost var last" (#6538)
- Fix GeneratorOutput_Buffer::set_estimates() (#6540)
- Revert "Make it possible to interpret a wide type as multiple smaller elements" (#6541)
- Convert apps/hannk/Elementwise to use generate() (#6543)
- Fixes for top-of-tree LLVM (#6546) (#6548)
- Fix deprecation warnings in Python tutorials (#6552)
- Use add_halide_generator() everywhere in apps/ (#6554)
- Fix for top-of-tree LLVM (#6561)
- Enable simd_op_check test for wasm i8x16.popcnt (#6562)
- Revert "Fix for top-of-tree LLVM" (#6564)
- wasm simd cleanup (#6566)
- Add support for wasm-simd ops for integer-integer widening (#6567)
- Add
explicit
to a handful of Generator-related ctors. (#6569) - Fix typo in comment in HalideBuffer.h (#6570)
- Allow calling scheduling methods on Output<Buffer[]> (#6577)
- Fix for top-of-tree LLVM (#6579)
- Fix Win32-specific breakage in top-of-tree LLVM (#6581)
- Convert apps/ to use static Buffer dims where useful (#6585)
- Various fixes to static-dimensioned Buffer (#6589)
- Convert Buffer<> usage in python_bindings/ to use static dimensions (#6591)
- Convert Buffer<> usage in test/generators to use static dimensions (#6592)
- Rename BufferDimsUnconstrained -> AnyDims (#6594)
- Allow building with LLVM15 (#6603)
- Update WasmExecutor for WABT API changes (#6612)
- Minor Generator cleanup (#6613)
- Unbreak WABT again by using main instead of a commit (#6614)
- Update apps/hannk to use TFLite 2.8.0 (#6617)
- Update WABT version to the just-released 1.027 (instead of main) (#6619)
- Clean up python_binding Makefile (#6634)
- Fix const-correctness in C/C++ backend (Issue #6636) (#6638)
- Convert most remaining Generators to prefer statically-dimensioned In… (#6641)
- Allow profiler feature under wasm iff wasm_threads is enabled (#6643)
- Fix UB in hannk FillWithRandom operation. (#6645)
- Update initialization of WABT
store
field to work with top-of-tree (#6649) - Fix apparent typo in PR #6294 (#6653)
- Eliminate some unnecessary clamping in ClampUnsafeAccesses (#6297) (#6654)
- Python Bindings: fix Python
bool
->Expr
implicit conversion (#6657) - Fix 'variable set but not used` warning/error (#6658)
- Allow
make test_apps
to work with ASAN (#6659) - Add optional runtime H::R::Buffer access checks (#6660)
- Add ldscript code for Python extensions in CMake (#6665)
- Remove the nobuild/partialbuildmethod tests from python_bindings/ (#6668)
- @TH3CHARLie
- @vksnk
- Scheduling directive to set an explicit storage bound (#6327)
- Add include for size_t in constants.h (#6353)
- Add missing widening_absd patterns (#6359)
- Change implementation of round_f* in CodeGen_C to use nearbyint() to match CodeGen_LLVM (#6406)
- Rewrite integer lerp using intrinsics (#6426)
- Avoid double narrowing in widening_add/widening_sub if type is 8-bit (#6629)
- @zvookin
New Contributors
- @TH3CHARLie made their first contribution in (#6334)
- @OmarEmaraDev made their first contribution in (#6310)
- @mcourteaux made their first contribution in (#6444)
- @jinderek made their first contribution in (#6505)
- @masahi made their first contribution in (#6563)
- @knzivid made their first contribution in (#6635)