github halide/Halide v14.0.0
Halide 14.0.0

latest releases: v19.0.0.dev0, v18.0.0.dev0, v17.0.0.dev0...
2 years ago

What's Changed

Major changes

  • @abadams
    • Add ability to pass a user context in JIT mode (#6313)
    • Reenable warning about unscheduled update definitions (#6602)
  • @alexreinking
    • Add helper for cross-compiling Halide generators. (#6366)
  • @LebedevRI
    • Implement SanitizerCoverage support (Refs. #6513) (#6517)
  • @steven-johnson
    • Expand optional static-typing for Buffer to include dimensionality (#6574)
    • Deprecate the Generator::build() method (#6580)
    • Move GeneratorContext into a standalone class (#6618)
    • Python Bindings didn't allow for zero-D Funcs, ImageParams, Buffers (#6633)
  • @zvookin
    • Timer based profiler (#6642)

Minor changes

  • @abadams
    • Deprecate JIT runtime override methods that take void * (#6344)
    • Allow users to use their own cuda contexts and streams in JIT mode (#6345)
    • Add --help flag to rungenmain, fixing #5323 (#6354)
    • Do target-specific lowering of lerp (#6432)
    • Reduce overhead of sampling profiler by having only one thread do it (#6433)
    • Skip custom cuda context test on older GPUs (#6437)
    • Avoid needless gather in fast_integer_divide lowering (#6441)
    • Fixes for c++20 (#6446)
    • Add a fast integer divide that rounds to zero (#6455)
    • Let lerp lowering incorporate a final cast. (#6480)
    • Try removing optional buffer added to closure (#6481)
    • rounding shift rights should use rounding halving add (#6494)
    • Make random faster by putting the innermost var last (#6504)
    • Make it possible to interpret a wide type as multiple smaller elements (#6506)
    • Handle mixed-width args to mul-shift-right (#6526)
    • Attempted redo of faster noise (#6539)
    • Better default lowering of absd (#6545)
    • Make HALIDE_REGISTER_GENERATOR work with multiple template args (#6556)
    • Rename Output to OutputFileType and deprecated Output (#6568)
    • Remove incorrect not-multiple-of-16 claim (#6573)
    • Fix bug in mul_shift_right matching (#6610)
  • @alexreinking
    • Add super-build for cross-compiling HANNK (#6374)
    • Fix empty INSTALL_COMMAND in hannk super-build (#6387)
    • Remove halide_config.cmake from Makefile build. Fixes #6615 (#6616)
    • Make IRComparer consider nans to be less than non-nans. (#6626)
  • @ashishUthama
    • Include LICENSE.txt in package (#6428)
  • @dsharletg
    • Fix description of rounding_shift_left/rounding_shift_right (#6549)
  • @Elarnon
    • Only commutative reductions can be parallelized (#6609)
  • @jinderek
    • Support new warp shuffle intrinsics after CUDA Volta architecture (#6505)
  • @knzivid
    • python_bindings: Fix SIGSEGV in HalidePythonCompileTimeErrorReporter (#6635)
  • @LebedevRI
    • [CMake] Deduplicate Halide_LLVM_VERSION and LLVM_PACKAGE_VERSION (#6646)
  • @masahi
    • [APP] Fix hexagon_benchmarks build (use two-var prefetch) (#6563)
  • @mcleary
    • Add support for AMX instructions (#5818)
  • @mcourteaux
    • Include GPU source kernels in Stmt and StmtHtml file. (#6444)
    • Syntax highlighting for embedded PTX code. (#6447)
  • @mgharbi
    • Fixes the Pytorch Wrapper Codegen for CPU-only machines. (#6590)
  • @OmarEmaraDev
    • Fix default device wrap native function (#6310)
    • Fix wrong type in Ramp CodeGen for OpenGLCompute (#6349)
    • Vectorize Ramp in OpenGLCompute backend (#6372)
    • Support vectorization in OpenGLCompute backend (#6348)
    • Support vectorized Select in OpenGLCompute backend (#6371)
  • @rootjalex
    • Make bounds of let visitor use unique_name() (#6583)
    • Remove incorrect docs on widening_add (#6625)
    • Disallow Type::narrow() and Type::widen() from producing bitwidths between 1 and 8 bits (#6622)
    • Wild match object should not be foldable (#6623)
    • Clear bounds info on casts when value bounds are undefined for overflow types (#6640)
  • @slomp
    • decommissioning StackPrinter (#6470)
  • @steven-johnson
    • [hannk] Fix MeanOp (#6336)
    • Add using OpVisitor::visit; to various OpVisitors to avoid overload warnings for some compilers (#6337)
    • [hannk] Add a prepare() method for ops and interp (#6338)
    • Fix WASM datalayout for top-of-tree LLVM (#6339)
    • Make halide_type_t and halide_type_of constexpr (#6340)
    • Harvest IWYU changes for LLVM, WABT (#6341)
    • Fix HelloWasm (#6342)
    • Fix Makefile for LLVM11 (injection from #5818) (#6343)
    • [hannk] requantize() should never skip the operation (#6350)
    • [hannk] augment SoftmaxOp to allow specifying axis (#6351)
    • Use Node instead of d8 for Wasm AOT testing (#6356)
    • [hannk] Add missing call to Interpreter::prepare in benchmark app (#6358)
    • [hannk] Allow disabling TFLite+Delegate build in CMake (#6360)
    • [hannk] Add support for building/running for wasm (#6361)
    • Update Emscripten settings (#6362)
    • [hannk] Clean up aliasing (v2) (#6364)
    • [hannk] tests should only process .tflite files (#6368)
    • Revamp Hannk IR (#6379)
    • Fix for top-of-tree LLVM (#6380)
    • Remove halide_assert() from halide_default_device_wrap_native (#6381)
    • Rename halide_assert -> halide_abort_if_false (#6382)
    • Convert various halide_assert -> static_assert (#6383)
    • Fix for top-of-tree LLVM (#6386)
    • Check results of all runtime function calls (#6389)
    • Add halide_debug_assert() macro (#6390)
    • [hannk] Have CMake emit .s, .stmt, .ll files (#6392)
    • [hannk] Upgrade hannk to use TFLite 2.7.0 by default (#6393)
    • Clean up CodeGen_LLVM names to match ASAN nomenclature changes (#6395)
    • Drop support for LLVM11 (#6396)
    • Move PyTorch test into standalone tests (#6397)
    • Remove halide_abort_if_false() usage in runtime/metal (#6398)
    • Fix OGLC debug builds (#6399)
    • Add defensive checks to halide_buffer_copy_already_locked (#6401)
    • _halide_buffer_crop() needs to check for runtime failures (v2) (#6403)
    • Fix broken ASAN code (#6408)
    • [hannk] Pacify clang-tidy (#6412)
    • One more ASAN fix (#6413)
    • [hannk] Fix lower_tflite_fullyconnected (#6414)
    • Fix Introspection issues (#6424)
    • Don't remap the function name or the target in the metadata (#6430)
    • Set up SANITIZER_FLAGS and OPTIMIZE for apps/Makefile.inc (#6435)
    • Ensure that halide_start_clock() is called before halide_current_time… (#6438)
    • Codegen_C: buffer compilation needs to special-case scalar buffers (#6442)
    • Add operator<< for Closure (#6443)
    • Re-enable performance_async_gpu for D3D12Compute (#6450)
    • Tweak Hexagon codegen output (#6461)
    • Add LinkageType::ExternalPlusArgv (#6452) (#6463)
    • Fix Closure API (#6464)
    • Move null check from Printer to halide_string_to_string() (#6467)
    • Deal with Printer::scratch (#6469) (#6472)
    • Restore support for using V8 as the Wasm JIT interpreter (#6478)
    • Fail if no_bounds_query specified for HL_JIT_TARGET (#6489)
    • Document the usage of llvm::legacy::PassManager (#6491)
    • Update WABT to 1.0.25 (#6497)
    • Grab Bag of minor cleanups to LowerParallelTasks (#6498)
    • Update simd_op_check for arm64 upz1 code generation (#6499) (#6500)
    • Fix size_t -> int conversion warning (#6501)
    • Fix simd-op-check for top-of-tree LLVM (#6529)
    • Revert "Make random faster by putting the innermost var last" (#6538)
    • Fix GeneratorOutput_Buffer::set_estimates() (#6540)
    • Revert "Make it possible to interpret a wide type as multiple smaller elements" (#6541)
    • Convert apps/hannk/Elementwise to use generate() (#6543)
    • Fixes for top-of-tree LLVM (#6546) (#6548)
    • Fix deprecation warnings in Python tutorials (#6552)
    • Use add_halide_generator() everywhere in apps/ (#6554)
    • Fix for top-of-tree LLVM (#6561)
    • Enable simd_op_check test for wasm i8x16.popcnt (#6562)
    • Revert "Fix for top-of-tree LLVM" (#6564)
    • wasm simd cleanup (#6566)
    • Add support for wasm-simd ops for integer-integer widening (#6567)
    • Add explicit to a handful of Generator-related ctors. (#6569)
    • Fix typo in comment in HalideBuffer.h (#6570)
    • Allow calling scheduling methods on Output<Buffer[]> (#6577)
    • Fix for top-of-tree LLVM (#6579)
    • Fix Win32-specific breakage in top-of-tree LLVM (#6581)
    • Convert apps/ to use static Buffer dims where useful (#6585)
    • Various fixes to static-dimensioned Buffer (#6589)
    • Convert Buffer<> usage in python_bindings/ to use static dimensions (#6591)
    • Convert Buffer<> usage in test/generators to use static dimensions (#6592)
    • Rename BufferDimsUnconstrained -> AnyDims (#6594)
    • Allow building with LLVM15 (#6603)
    • Update WasmExecutor for WABT API changes (#6612)
    • Minor Generator cleanup (#6613)
    • Unbreak WABT again by using main instead of a commit (#6614)
    • Update apps/hannk to use TFLite 2.8.0 (#6617)
    • Update WABT version to the just-released 1.027 (instead of main) (#6619)
    • Clean up python_binding Makefile (#6634)
    • Fix const-correctness in C/C++ backend (Issue #6636) (#6638)
    • Convert most remaining Generators to prefer statically-dimensioned In… (#6641)
    • Allow profiler feature under wasm iff wasm_threads is enabled (#6643)
    • Fix UB in hannk FillWithRandom operation. (#6645)
    • Update initialization of WABT store field to work with top-of-tree (#6649)
    • Fix apparent typo in PR #6294 (#6653)
    • Eliminate some unnecessary clamping in ClampUnsafeAccesses (#6297) (#6654)
    • Python Bindings: fix Python bool -> Expr implicit conversion (#6657)
    • Fix 'variable set but not used` warning/error (#6658)
    • Allow make test_apps to work with ASAN (#6659)
    • Add optional runtime H::R::Buffer access checks (#6660)
    • Add ldscript code for Python extensions in CMake (#6665)
    • Remove the nobuild/partialbuildmethod tests from python_bindings/ (#6668)
  • @TH3CHARLie
    • Add support for CUDA capability 8.6 (#6334)
    • Fix cuda-debug logging (#6346)
  • @vksnk
    • Scheduling directive to set an explicit storage bound (#6327)
    • Add include for size_t in constants.h (#6353)
    • Add missing widening_absd patterns (#6359)
    • Change implementation of round_f* in CodeGen_C to use nearbyint() to match CodeGen_LLVM (#6406)
    • Rewrite integer lerp using intrinsics (#6426)
    • Avoid double narrowing in widening_add/widening_sub if type is 8-bit (#6629)
  • @zvookin
    • Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. (#6195)
    • Fixes to support LLVM with opaque pointers. (#6608)

New Contributors

Don't miss a new Halide release

NewReleases is sending notifications on new releases.