intel/intel-graphics-compiler igc-1.0.6410 on GitHub

Fixed Issues / Improvements

Consider a WA table entry before inserting a flush sampler instruction
Location expressions improvements
Do not split arithmetic instructions in IGC as vISA will handle it
Backing out Simple push algorithm Optimization
Fix reg number issue in translate math
Changes for -O2. Optimizing non-user functions to save compiling time.
Fix the SWSB when there is no send in kernel
Add support to generate thread IDs in 2x2 blocks.
Seperate global and local variables to reduce compilation time.
Don't replace OpDecorate with OpGroupDecorate.
Add InferAddressSpacesPass only if needed.
Fix crash in SIMD32 mode caused by pseudo_ret instruction's source operand right bound computation.
Update DispatchGPGPUWalkerAlongYFirst lookup
Changes for -O2. Optimizing non-user functions to save compiling time.
Changed ldms and ldmcs convertion to 16bit. Fixed ldmcs usage in users other than 16bit ldms
Cleanup unnecessary dynamic allocations.
Avoid warning of implicit i64->i32 by forcing explicit conversion.
Optimization for signed reminder for constant power of 2 int32.
Switch TPM to SVM entirely.
Do not modify wrregion input in non-overlapping region optimization.
Changed ldms and ldmcs convertion to 16bit. Fixed ldmcs usage in users other than 16bit ldms
Avoid warning of implicit i64->i32 by forcing explicit conversion
Simplify usage of IGC_BUILD__VC_ENABLED cmake option Change IGC_VC_DISABLED macro to more consistent IGC_VC_ENABLED
Removed external dependency on llvm_patches and improved llvm setup in project
Produce truncate instead of __builtin_spirv_OpUConvert for not rounded/saturated converts
Fix missing barrier when inline ASM is used in a kernel
Split uniform into thread uniform, work group uniform, and global uniform. which give us a detail info that could be used to enable better optimization.
Set InlineAsm usage per function group, to create correct builder for multiple FGs.
Support for stackcalls with InlineAsm by parsing multiple functions in single text stream.
Broadcast uniform variables if 'rw' constraint was specified (Inline ASM)
Optimize generic pointer load for kernels not using local memory.
Bug fix for SWSB when comparing the footprint.
Produce truncate instead of __builtin_spirv_OpUConvert for not rounded/saturated converts.
Extend GAS phi resolution to all loops, not only top level ones.
Remove the dependence between dummy csel instructions.
Adds custom iterator class for Function Group. Can iterate through the FunctionGroup class, which uses a 2D vector storage.
Split uniform into thread uniform, work group uniform, and global uniform. which give us a detail info that could be used to enable better optimization.
Change OpenCL builtin mad implementation to use fma instruction instead of multiply add.
Cast Base and Insert parameters to unsigned to avoid sign extension while shifting
Add check for compute shaders that may need XYZ walk of thread IDs.
ZEBinary: Fix scractch memory buffer creation.
If unmasked regions are nested then the most nested intrinsic llvm.genx.GenISA.UnmaskedRegionEnd switched off unmasked code generation, resulting in other embracing nested regions generatedr as masked code.
Fix missing barrier when inline ASM is used in a kernel.
Extra flag has been added to WIAnalysis Runner to not mark some uniform instructions as random.
Added a field to implicit argument structure for stack calls. Modified layout of local ids based on SIMD size.
IGA: add disassembler option "--output-on-fail"
Fix discovery of inlined DISubprogram nodes
Implement support for both SPV-IR forms for BitFieldInsert builtins
Introduction of new entry in IGC constant folder for bfrev.
Update TracePointerSource() function to detect cases where two different resource pointer values describe the same resource.
Vector backend does not support creation of L0 module with external functions. Insert assert in GenXCisaBuilder, explaining that.
Take SpillMemOffset into consideration when reporting spill size.
Split send has argument no 4, and it can be addr register. Make sure check dependence on src3 as well.
Add case when propagating non-generic pointer to store.
Disable certain transformations when compiling code for debug.
Add -vc-promote-array-alloca-limit knob to control array promotion total size (2nd edition). Force array promotion for CMRT binary.
Replace strcat by compound assignment operator
Now appropriately handling shl instructions with unsupported types.
More fixes to get local RA to honor declare even-alignment.
Print SLMsize in compiler output file
IGA SWSB refactoring: Unify InstType getter function
Fix missing barrier when inline ASM is used in a kernel
Extract vc input handling into another function
Fix an assertion due to unexpected RAUW with a constant
Extend supported subtargets in VC
Solve the memory leak issue of SWSB
Add control to route some resources to LSC/HDC
Fix scratch surface allocation for VC
Remove addrspacecast only if there no other uses.
Set alwaysinline on invoke kernels. Don't add stack call or indirect call attributes.
Extract vc input handling into another function
Add interface target for vc intrinsics headers
Move stepping into Options instead of a global variable.
Add DoNotSpill attribute for vISA variables.
ZEBinary: Support buffer_offset implicit argument
If all its operands are region invariant, an inst is region invariant.
Commit base data structures for implicit argument handling for bindless offsets. Changes in StatelessToBindless promotion will come later.
For optnone builtins, allow -O0 flag to determine if we should call them as subroutines or stackcalls.
Allow EnableA64WA env variable in Linux relesae mode.
BinaryEncodingIGA: fix math pipe instruction check
Upgraded error messages with source file locations and names of the kernel causing the error.
Implement support for both SPV-IR forms for conversion builtins
Prevent redundant lowering attempt during SIMD CF Conformance
Now appropriately handling shl instructions with unsupported types.
Make sure trivial RA honors even-alignment.
ZEBinary: add regkey to enable .bss section for zero-initialized global variables
add -vc-promote-array-alloca-limit knob to control array promotion total size
Add simplify CFG pass to pass manager to simplify work of LICM
Add an option for GenXPromoteArray threshold
Debug location expression improvements
Reduce memory footprint in GraphColor
Fix binary encoding for simd2 align16 instructions
Filter out "endif" and "else" when inserting dummy mov.
Avoid localization of large data for oclbin, use relocation instead.
Add option for TPM memory placement.
Correct localization costs for global vectors.

Dependencies revisions

intel/opencl-clang@c8cd72e
KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
intel/vc-intrinsics@7ee152a
KhronosGroup/SPIRV-LLVM-Translator@ab5e12a (for VectorCompiler)
llvm/llvm-project@llvmorg-10.0.0

Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.