Fixed Issues / Improvements
- Consider a WA table entry before inserting a flush sampler instruction
- Location expressions improvements
- Do not split arithmetic instructions in IGC as vISA will handle it
- Backing out Simple push algorithm Optimization
- Fix reg number issue in translate math
- Changes for -O2. Optimizing non-user functions to save compiling time.
- Fix the SWSB when there is no send in kernel
- Add support to generate thread IDs in 2x2 blocks.
- Seperate global and local variables to reduce compilation time.
- Don't replace OpDecorate with OpGroupDecorate.
- Add InferAddressSpacesPass only if needed.
- Fix crash in SIMD32 mode caused by pseudo_ret instruction's source operand right bound computation.
- Update DispatchGPGPUWalkerAlongYFirst lookup
- Changes for -O2. Optimizing non-user functions to save compiling time.
- Changed ldms and ldmcs convertion to 16bit. Fixed ldmcs usage in users other than 16bit ldms
- Cleanup unnecessary dynamic allocations.
- Avoid warning of implicit i64->i32 by forcing explicit conversion.
- Optimization for signed reminder for constant power of 2 int32.
- Switch TPM to SVM entirely.
- Do not modify wrregion input in non-overlapping region optimization.
- Changed ldms and ldmcs convertion to 16bit. Fixed ldmcs usage in users other than 16bit ldms
- Avoid warning of implicit i64->i32 by forcing explicit conversion
- Simplify usage of IGC_BUILD__VC_ENABLED cmake option Change IGC_VC_DISABLED macro to more consistent IGC_VC_ENABLED
- Removed external dependency on llvm_patches and improved llvm setup in project
- Produce truncate instead of __builtin_spirv_OpUConvert for not rounded/saturated converts
- Fix missing barrier when inline ASM is used in a kernel
- Split uniform into thread uniform, work group uniform, and global uniform. which give us a detail info that could be used to enable better optimization.
- Set InlineAsm usage per function group, to create correct builder for multiple FGs.
- Support for stackcalls with InlineAsm by parsing multiple functions in single text stream.
- Broadcast uniform variables if 'rw' constraint was specified (Inline ASM)
- Optimize generic pointer load for kernels not using local memory.
- Bug fix for SWSB when comparing the footprint.
- Produce truncate instead of __builtin_spirv_OpUConvert for not rounded/saturated converts.
- Extend GAS phi resolution to all loops, not only top level ones.
- Remove the dependence between dummy csel instructions.
- Adds custom iterator class for Function Group. Can iterate through the FunctionGroup class, which uses a 2D vector storage.
- Split uniform into thread uniform, work group uniform, and global uniform. which give us a detail info that could be used to enable better optimization.
- Change OpenCL builtin mad implementation to use fma instruction instead of multiply add.
- Cast Base and Insert parameters to unsigned to avoid sign extension while shifting
- Add check for compute shaders that may need XYZ walk of thread IDs.
- ZEBinary: Fix scractch memory buffer creation.
- If unmasked regions are nested then the most nested intrinsic llvm.genx.GenISA.UnmaskedRegionEnd switched off unmasked code generation, resulting in other embracing nested regions generatedr as masked code.
- Fix missing barrier when inline ASM is used in a kernel.
- Extra flag has been added to WIAnalysis Runner to not mark some uniform instructions as random.
- Added a field to implicit argument structure for stack calls. Modified layout of local ids based on SIMD size.
- IGA: add disassembler option "--output-on-fail"
- Fix discovery of inlined DISubprogram nodes
- Implement support for both SPV-IR forms for BitFieldInsert builtins
- Introduction of new entry in IGC constant folder for bfrev.
- Update TracePointerSource() function to detect cases where two different resource pointer values describe the same resource.
- Vector backend does not support creation of L0 module with external functions. Insert assert in GenXCisaBuilder, explaining that.
- Take SpillMemOffset into consideration when reporting spill size.
- Split send has argument no 4, and it can be addr register. Make sure check dependence on src3 as well.
- Add case when propagating non-generic pointer to store.
- Disable certain transformations when compiling code for debug.
- Add -vc-promote-array-alloca-limit knob to control array promotion total size (2nd edition). Force array promotion for CMRT binary.
- Replace strcat by compound assignment operator
- Now appropriately handling shl instructions with unsupported types.
- More fixes to get local RA to honor declare even-alignment.
- Print SLMsize in compiler output file
- IGA SWSB refactoring: Unify InstType getter function
- Fix missing barrier when inline ASM is used in a kernel
- Extract vc input handling into another function
- Fix an assertion due to unexpected RAUW with a constant
- Extend supported subtargets in VC
- Solve the memory leak issue of SWSB
- Add control to route some resources to LSC/HDC
- Fix scratch surface allocation for VC
- Remove addrspacecast only if there no other uses.
- Set alwaysinline on invoke kernels. Don't add stack call or indirect call attributes.
- Extract vc input handling into another function
- Add interface target for vc intrinsics headers
- Move stepping into Options instead of a global variable.
- Add DoNotSpill attribute for vISA variables.
- ZEBinary: Support buffer_offset implicit argument
- If all its operands are region invariant, an inst is region invariant.
- Commit base data structures for implicit argument handling for bindless offsets. Changes in StatelessToBindless promotion will come later.
- For optnone builtins, allow -O0 flag to determine if we should call them as subroutines or stackcalls.
- Allow EnableA64WA env variable in Linux relesae mode.
- BinaryEncodingIGA: fix math pipe instruction check
- Upgraded error messages with source file locations and names of the kernel causing the error.
- Implement support for both SPV-IR forms for conversion builtins
- Prevent redundant lowering attempt during SIMD CF Conformance
- Now appropriately handling shl instructions with unsupported types.
- Make sure trivial RA honors even-alignment.
- ZEBinary: add regkey to enable .bss section for zero-initialized global variables
- add -vc-promote-array-alloca-limit knob to control array promotion total size
- Add simplify CFG pass to pass manager to simplify work of LICM
- Add an option for GenXPromoteArray threshold
- Debug location expression improvements
- Reduce memory footprint in GraphColor
- Fix binary encoding for simd2 align16 instructions
- Filter out "endif" and "else" when inserting dummy mov.
- Avoid localization of large data for oclbin, use relocation instead.
- Add option for TPM memory placement.
- Correct localization costs for global vectors.
Dependencies revisions
- intel/opencl-clang@c8cd72e
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@7ee152a
- KhronosGroup/SPIRV-LLVM-Translator@ab5e12a (for VectorCompiler)
- llvm/llvm-project@llvmorg-10.0.0
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.