pocl 1.0

Highlights

Improved automatic local work-group sizing on kernel enqueue, taking
into account standard constraints, SIMD width for vectorization as
well as the number of compute units available on the device.
Support for NVIDIA GPUs via a new CUDA backend (currently experimental).
Removed support for BBVectorizer.
LLVM 5.0 is now supported.
A few build options have been added for distribution builds,
see README.packaging.
Somewhat improved scalability in the CPU driver. CPUs with many cores
and programs using a lot of WIs with small kernels can run somewhat faster.
Full conformance with OpenCL 1.2 standard, enabled by default. There are
some caveats though - see the documentation.
When conformance is enabled, some kernel library functions might be
slower than in previous releases.
Pocl now reports OpenCL 1.2 instead of 2.0, except HSA enabled builds.
Updated format of pocl binaries, which is NOT backwards compatible.
You'll need to clean any kernel caches.
Fixed several memory leaks.
Unresolved symbols (missing/misspelled functions etc) in a kernel will
result in error in clBuildProgram() instead of pocl silently ignoring
them and then aborting at dlopen().
New env variable POCL_MEMORY_LIMIT=N limits the Global memory size
reported by pocl to N gigabytes.
New env variable POCL_AFFINITY (defaults to 0): if enabled, sets
the affinity of each CPU driver pthread to a single core.
Improved AVX512 support (with LLVM 5.0). Note that even with LLVM 5.0
there are still a few bugs (see pocl issue #555); AVX512 + LLVM 4.0 are
a lot more broken, and probably not worth trying.
POCL_DEBUG env var has been revamped. You can now limit debuginfo to
these categories (or their combination): all,error,warning,general
memory,llvm,events,cache,locking,refcounts,timing,hsa,tce,cuda
The old setting POCL_DEBUG=1 now equals error+warning+general.

pocl/pocl v1.0 pocl 1.0 on GitHub

pocl 1.0

Highlights

pocl/pocl v1.0
pocl 1.0

on GitHub