halide/Halide v14.0.0 on GitHub

What's Changed

Major changes

@abadams
- Add ability to pass a user context in JIT mode (#6313)
- Reenable warning about unscheduled update definitions (#6602)
@alexreinking
- Add helper for cross-compiling Halide generators. (#6366)
@LebedevRI
- Implement SanitizerCoverage support (Refs. #6513) (#6517)
@steven-johnson
- Expand optional static-typing for Buffer to include dimensionality (#6574)
- Deprecate the Generator::build() method (#6580)
- Move GeneratorContext into a standalone class (#6618)
- Python Bindings didn't allow for zero-D Funcs, ImageParams, Buffers (#6633)
@zvookin
- Timer based profiler (#6642)

Minor changes

@abadams
- Deprecate JIT runtime override methods that take void * (#6344)
- Allow users to use their own cuda contexts and streams in JIT mode (#6345)
- Add --help flag to rungenmain, fixing #5323 (#6354)
- Do target-specific lowering of lerp (#6432)
- Reduce overhead of sampling profiler by having only one thread do it (#6433)
- Skip custom cuda context test on older GPUs (#6437)
- Avoid needless gather in fast_integer_divide lowering (#6441)
- Fixes for c++20 (#6446)
- Add a fast integer divide that rounds to zero (#6455)
- Let lerp lowering incorporate a final cast. (#6480)
- Try removing optional buffer added to closure (#6481)
- rounding shift rights should use rounding halving add (#6494)
- Make random faster by putting the innermost var last (#6504)
- Make it possible to interpret a wide type as multiple smaller elements (#6506)
- Handle mixed-width args to mul-shift-right (#6526)
- Attempted redo of faster noise (#6539)
- Better default lowering of absd (#6545)
- Make HALIDE_REGISTER_GENERATOR work with multiple template args (#6556)
- Rename Output to OutputFileType and deprecated Output (#6568)
- Remove incorrect not-multiple-of-16 claim (#6573)
- Fix bug in mul_shift_right matching (#6610)
@alexreinking
- Add super-build for cross-compiling HANNK (#6374)
- Fix empty INSTALL_COMMAND in hannk super-build (#6387)
- Remove halide_config.cmake from Makefile build. Fixes #6615 (#6616)
- Make IRComparer consider nans to be less than non-nans. (#6626)
@ashishUthama
- Include LICENSE.txt in package (#6428)
@dsharletg
- Fix description of rounding_shift_left/rounding_shift_right (#6549)
@Elarnon
- Only commutative reductions can be parallelized (#6609)
@jinderek
- Support new warp shuffle intrinsics after CUDA Volta architecture (#6505)
@knzivid
- python_bindings: Fix SIGSEGV in HalidePythonCompileTimeErrorReporter (#6635)
@LebedevRI
- [CMake] Deduplicate Halide_LLVM_VERSION and LLVM_PACKAGE_VERSION (#6646)
@masahi
- [APP] Fix hexagon_benchmarks build (use two-var prefetch) (#6563)
@mcleary
- Add support for AMX instructions (#5818)
@mcourteaux
- Include GPU source kernels in Stmt and StmtHtml file. (#6444)
- Syntax highlighting for embedded PTX code. (#6447)
@mgharbi
- Fixes the Pytorch Wrapper Codegen for CPU-only machines. (#6590)
@OmarEmaraDev
- Fix default device wrap native function (#6310)
- Fix wrong type in Ramp CodeGen for OpenGLCompute (#6349)
- Vectorize Ramp in OpenGLCompute backend (#6372)
- Support vectorization in OpenGLCompute backend (#6348)
- Support vectorized Select in OpenGLCompute backend (#6371)
@rootjalex
- Make bounds of let visitor use unique_name() (#6583)
- Remove incorrect docs on widening_add (#6625)
- Disallow Type::narrow() and Type::widen() from producing bitwidths between 1 and 8 bits (#6622)
- Wild match object should not be foldable (#6623)
- Clear bounds info on casts when value bounds are undefined for overflow types (#6640)
@slomp
- decommissioning StackPrinter (#6470)
@steven-johnson
- [hannk] Fix MeanOp (#6336)
- Add using OpVisitor::visit; to various OpVisitors to avoid overload warnings for some compilers (#6337)
- [hannk] Add a prepare() method for ops and interp (#6338)
- Fix WASM datalayout for top-of-tree LLVM (#6339)
- Make halide_type_t and halide_type_of constexpr (#6340)
- Harvest IWYU changes for LLVM, WABT (#6341)
- Fix HelloWasm (#6342)
- Fix Makefile for LLVM11 (injection from #5818) (#6343)
- [hannk] requantize() should never skip the operation (#6350)
- [hannk] augment SoftmaxOp to allow specifying axis (#6351)
- Use Node instead of d8 for Wasm AOT testing (#6356)
- [hannk] Add missing call to Interpreter::prepare in benchmark app (#6358)
- [hannk] Allow disabling TFLite+Delegate build in CMake (#6360)
- [hannk] Add support for building/running for wasm (#6361)
- Update Emscripten settings (#6362)
- [hannk] Clean up aliasing (v2) (#6364)
- [hannk] tests should only process .tflite files (#6368)
- Revamp Hannk IR (#6379)
- Fix for top-of-tree LLVM (#6380)
- Remove halide_assert() from halide_default_device_wrap_native (#6381)
- Rename halide_assert -> halide_abort_if_false (#6382)
- Convert various halide_assert -> static_assert (#6383)
- Fix for top-of-tree LLVM (#6386)
- Check results of all runtime function calls (#6389)
- Add halide_debug_assert() macro (#6390)
- [hannk] Have CMake emit .s, .stmt, .ll files (#6392)
- [hannk] Upgrade hannk to use TFLite 2.7.0 by default (#6393)
- Clean up CodeGen_LLVM names to match ASAN nomenclature changes (#6395)
- Drop support for LLVM11 (#6396)
- Move PyTorch test into standalone tests (#6397)
- Remove halide_abort_if_false() usage in runtime/metal (#6398)
- Fix OGLC debug builds (#6399)
- Add defensive checks to halide_buffer_copy_already_locked (#6401)
- _halide_buffer_crop() needs to check for runtime failures (v2) (#6403)
- Fix broken ASAN code (#6408)
- [hannk] Pacify clang-tidy (#6412)
- One more ASAN fix (#6413)
- [hannk] Fix lower_tflite_fullyconnected (#6414)
- Fix Introspection issues (#6424)
- Don't remap the function name or the target in the metadata (#6430)
- Set up SANITIZER_FLAGS and OPTIMIZE for apps/Makefile.inc (#6435)
- Ensure that halide_start_clock() is called before halide_current_time… (#6438)
- Codegen_C: buffer compilation needs to special-case scalar buffers (#6442)
- Add operator<< for Closure (#6443)
- Re-enable performance_async_gpu for D3D12Compute (#6450)
- Tweak Hexagon codegen output (#6461)
- Add LinkageType::ExternalPlusArgv (#6452) (#6463)
- Fix Closure API (#6464)
- Move null check from Printer to halide_string_to_string() (#6467)
- Deal with Printer::scratch (#6469) (#6472)
- Restore support for using V8 as the Wasm JIT interpreter (#6478)
- Fail if no_bounds_query specified for HL_JIT_TARGET (#6489)
- Document the usage of llvm::legacy::PassManager (#6491)
- Update WABT to 1.0.25 (#6497)
- Grab Bag of minor cleanups to LowerParallelTasks (#6498)
- Update simd_op_check for arm64 upz1 code generation (#6499) (#6500)
- Fix size_t -> int conversion warning (#6501)
- Fix simd-op-check for top-of-tree LLVM (#6529)
- Revert "Make random faster by putting the innermost var last" (#6538)
- Fix GeneratorOutput_Buffer::set_estimates() (#6540)
- Revert "Make it possible to interpret a wide type as multiple smaller elements" (#6541)
- Convert apps/hannk/Elementwise to use generate() (#6543)
- Fixes for top-of-tree LLVM (#6546) (#6548)
- Fix deprecation warnings in Python tutorials (#6552)
- Use add_halide_generator() everywhere in apps/ (#6554)
- Fix for top-of-tree LLVM (#6561)
- Enable simd_op_check test for wasm i8x16.popcnt (#6562)
- Revert "Fix for top-of-tree LLVM" (#6564)
- wasm simd cleanup (#6566)
- Add support for wasm-simd ops for integer-integer widening (#6567)
- Add explicit to a handful of Generator-related ctors. (#6569)
- Fix typo in comment in HalideBuffer.h (#6570)
- Allow calling scheduling methods on Output<Buffer[]> (#6577)
- Fix for top-of-tree LLVM (#6579)
- Fix Win32-specific breakage in top-of-tree LLVM (#6581)
- Convert apps/ to use static Buffer dims where useful (#6585)
- Various fixes to static-dimensioned Buffer (#6589)
- Convert Buffer<> usage in python_bindings/ to use static dimensions (#6591)
- Convert Buffer<> usage in test/generators to use static dimensions (#6592)
- Rename BufferDimsUnconstrained -> AnyDims (#6594)
- Allow building with LLVM15 (#6603)
- Update WasmExecutor for WABT API changes (#6612)
- Minor Generator cleanup (#6613)
- Unbreak WABT again by using main instead of a commit (#6614)
- Update apps/hannk to use TFLite 2.8.0 (#6617)
- Update WABT version to the just-released 1.027 (instead of main) (#6619)
- Clean up python_binding Makefile (#6634)
- Fix const-correctness in C/C++ backend (Issue #6636) (#6638)
- Convert most remaining Generators to prefer statically-dimensioned In… (#6641)
- Allow profiler feature under wasm iff wasm_threads is enabled (#6643)
- Fix UB in hannk FillWithRandom operation. (#6645)
- Update initialization of WABT store field to work with top-of-tree (#6649)
- Fix apparent typo in PR #6294 (#6653)
- Eliminate some unnecessary clamping in ClampUnsafeAccesses (#6297) (#6654)
- Python Bindings: fix Python bool -> Expr implicit conversion (#6657)
- Fix 'variable set but not used` warning/error (#6658)
- Allow make test_apps to work with ASAN (#6659)
- Add optional runtime H::R::Buffer access checks (#6660)
- Add ldscript code for Python extensions in CMake (#6665)
- Remove the nobuild/partialbuildmethod tests from python_bindings/ (#6668)
@TH3CHARLie
- Add support for CUDA capability 8.6 (#6334)
- Fix cuda-debug logging (#6346)
@vksnk
- Scheduling directive to set an explicit storage bound (#6327)
- Add include for size_t in constants.h (#6353)
- Add missing widening_absd patterns (#6359)
- Change implementation of round_f* in CodeGen_C to use nearbyint() to match CodeGen_LLVM (#6406)
- Rewrite integer lerp using intrinsics (#6426)
- Avoid double narrowing in widening_add/widening_sub if type is 8-bit (#6629)
@zvookin
- Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. (#6195)
- Fixes to support LLVM with opaque pointers. (#6608)

New Contributors

@TH3CHARLie made their first contribution in (#6334)
@OmarEmaraDev made their first contribution in (#6310)
@mcourteaux made their first contribution in (#6444)
@jinderek made their first contribution in (#6505)
@masahi made their first contribution in (#6563)
@knzivid made their first contribution in (#6635)

halide/Halide v14.0.0 Halide 14.0.0 on GitHub

What's Changed

Major changes

Minor changes

New Contributors

halide/Halide v14.0.0
Halide 14.0.0

on GitHub