Principal Technical Improvements

Support for Linux perf events.

Linux perf events provides a powerful interface that supports measurement of both application execution and kernel activity. Using perf events, one can measure both hardware and software events. Using a processor’s hardware performance monitoring unit (PMU), the perf events interface can measure an execution using any hardware counter supported by the PMU.

Frequency-based sampling.

Rather than picking a sample period for a hardware counter, the Linux perf events interface enables one to specify the desired sampling frequency and have the kernel automatically select and adjust the period to try to achieve the desired sampling frequency.

Multiplexing.

Using multiplexing enables one to monitor more events in a single execution than the number of hardware counters a processor can support for each thread. The number of events that can be monitored in a single execution is only limited by the maximum number of concurrent events that the kernel will allow a user to multiplex using the perf events interface.
When more events are specified than can be monitored simultaneously using a thread’s hardware counters, the kernel will employ multiplexing and divide the set of events to be monitored into groups, monitor only one group of events at a time, and cycle repeatedly through the groups as a program executes.

Kernel sampling

Collect calling-context into the kernel using perf_events. It adds support for extending user-level program contexts with kernel calling contexts. The kernel call chains interpretation requires the value /proc/sys/kernel/kptr_restrict=0 and /proc/sys/kernel/perf_event_paranoid=1 (1 or 0).

Thread blocking.

When a program executes, a thread may block waiting for the kernel to complete some operation on its behalf. Example operations include waiting for a read operation to complete or having the kernel service a page fault or zero-fill a page.
On systems running Linux 4.3 or newer, one can use the perf events sample source to monitor how much time a thread is blocked and where the blocking occurs.

Improvements to call stack unwinding

Members of the project team fixed bugs identified by our testing of libunwind in the context of HPCToolkit's measurement infrastructure and helped refine libunwind to enable an external tool, e.g., HPCToolkit's hpcrun, to cache libunwind recipes for a procedure to avoid the need to recompute them on demand later.

hpctoolkit-externals includes a snapshot of libunwind as of 2 October 2017.

Improved binary analysis

This release of HPCToolkit benefits from refinements to Dyninst that improve hpcstruct's ability to reconstruct control flow graphs for procedures in the presence of jump tables.

hpctoolkit-externals includes Dyninst 9.3.2 supplemented with patches that include important but unreleased improvements.

HPCToolkit/hpctoolkit release-2017.10 Release 2017.10 on GitHub