Instrumentation Profile Format

Overview

Clang supports two types of profiling via instrumentation [1]: frontend-based and IR-based, and both could support a variety of use cases [2] . This document describes two binary serialization formats (raw and indexed) to store instrumented profiles with a specific emphasis on IRPGO use case, in the sense that when specific header fields and payload sections have different ways of interpretation across use cases, the documentation is based on IRPGO.

Note

Frontend-generated profiles are used together with coverage mapping for source-based code coverage. The coverage mapping format is different from profile format.

Raw Profile Format

The raw profile is generated by running the instrumented binary. The raw profile data from an executable or a shared library [3] consists of a header and multiple sections, with each section as a memory dump. The raw profile data needs to be reasonably compact and fast to generate.

There are no backward or forward version compatiblity guarantees for the raw profile format. That is, compilers and tools require a specific raw profile version to parse the profiles.

To feed profiles back into compilers for an optimized build (e.g., via -fprofile-use for IR instrumentation), a raw profile must to be converted into indexed format.

General Storage Layout

The storage layout of raw profile data format is illustrated below. Basically, when the raw profile is read into an memory buffer, the actual byte offset of a section is inferred from the section’s order in the layout and size information of all the sections ahead of it.

+----+-----------------------+
|    |        Magic          |
|    +-----------------------+
|    |        Version        |
|    +-----------------------+
H    |   Size Info for       |
E    |      Section 1        |
A    +-----------------------+
D    |   Size Info for       |
E    |      Section 2        |
R    +-----------------------+
|    |          ...          |
|    +-----------------------+
|    |   Size Info for       |
|    |      Section N        |
+----+-----------------------+
P    |       Section 1       |
A    +-----------------------+
Y    |       Section 2       |
L    +-----------------------+
O    |          ...          |
A    +-----------------------+
D    |       Section N       |
+----+-----------------------+

Note

Sections might be padded to meet specific alignment requirements. For simplicity, header fields and data sections solely for padding purpose are omitted in the data layout graph above and the rest of this document.

Payload Sections

Binary Ids

Stores the binary ids of the instrumented binaries to associate binaries with profiles for source code coverage. See binary id RFC for the design.

Profile Metadata

This section stores the metadata to map counters and value profiles back to instrumented code regions (e.g., LLVM IR for IRPGO).

The in-memory representation of the metadata is __llvm_profile_data. Some fields are used to reference data from other sections in the profile. The fields are documented as follows:

NameRef

The MD5 of the function’s PGO name. PGO name has the format [<filepath><delimiter>]<mangled-name> where <filepath> and <delimiter> are provided for local-linkage functions to tell possibly identical functions.

FuncHash

A checksum of the function’s IR, taking control flow graph and instrumented value sites into accounts. See computeCFGHash for details.

CounterPtr

The in-memory address difference between profile data and the start of corresponding counters. Counter position is stored this way (as a link-time constant) to reduce instrumented binary size compared with snapshotting the address of symbols directly. See commit a1532ed for further information.

Note

CounterPtr might represent a different value for non-IRPGO use case. For example, for binary profile correlation, it represents the absolute address of counter. When in doubt, check source code.

BitmapPtr

The in-memory address difference between profile data and the start address of corresponding bitmap.

Note

Similar to CounterPtr, this field may represent a different value for non-IRPGO use case.

FunctionPointer

Records the function address when instrumented binary runs. This is used to map the profiled callee address of indirect calls to the NameRef during conversion from raw to indexed profiles.

Values

Represents value profiles in a two dimensional array. The number of elements in the first dimension is the number of instrumented value sites across all kinds. Each element in the first dimension is the head of a linked list, and the each element in the second dimension is linked list element, carrying <profiled-value, count> as payload. This is used by compiler runtime when writing out value profiles.

Note

Value profiling is supported by frontend and IR PGO instrumentation, but it’s not supported in all cases (e.g., lightweight instrumentation).

NumCounters

The number of counters for the instrumented function.

NumValueSites

This is an array of counters, and each counter represents the number of instrumented sites for a kind of value in the function.

NumBitmapBytes

The number of bitmap bytes for the function.

Profile Counters

For PGO [4], the counters within an instrumented function of a specific FuncHash are stored contiguously and in an order that is consistent with instrumentation points selection.

As mentioned above, the recorded counter offset is relative to the profile metadata. So how are function counters located in the raw profile data?

Basically, the profile reader iterates profile metadata (from the profile metadata section) and makes use of the recorded relative distances, as illustrated below.

       + --> start(__llvm_prf_data) --> +---------------------+ ------------+
       |                                |       Data 1        |             |
       |                                +---------------------+  =====||    |
       |                                |       Data 2        |       ||    |
       |                                +---------------------+       ||    |
       |                                |        ...          |       ||    |
Counter|                                +---------------------+       ||    |
 Delta |                                |       Data N        |       ||    |
       |                                +---------------------+       ||    |   CounterPtr1
       |                                                              ||    |
       |                                              CounterPtr2     ||    |
       |                                                              ||    |
       |                                                              ||    |
       + --> start(__llvm_prf_cnts) --> +---------------------+       ||    |
                                        |        ...          |       ||    |
                                        +---------------------+  -----||----+
                                        |    Counter for      |       ||
                                        |       Data 1        |       ||
                                        +---------------------+       ||
                                        |        ...          |       ||
                                        +---------------------+  =====||
                                        |    Counter for      |
                                        |       Data 2        |
                                        +---------------------+
                                        |        ...          |
                                        +---------------------+
                                        |    Counter for      |
                                        |       Data N        |
                                        +---------------------+

In the graph,

  • The profile header records CounterDelta with the value as start(__llvm_prf_cnts) - start(__llvm_prf_data). We will call it CounterDeltaInitVal below for convenience.

  • For each profile data record ProfileDataN, CounterPtr is recorded as start(CounterN) - start(ProfileDataN), where ProfileDataN is the N-th entry in __llvm_prf_data, and CounterN represents the corresponding profile counters.

Each time the reader advances to the next data record, it updates CounterDelta to minus the size of one ProfileData.

For the counter corresponding to the first data record, the byte offset relative to the start of the counter section is calculated as CounterPtr1 - CounterDeltaInitVal. When profile reader advances to the second data record, note CounterDelta is updated to CounterDeltaInitVal - sizeof(ProfileData). Thus the byte offset relative to the start of the counter section is calculated as CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData)).

Bitmap

This section is used for source-based Modified Condition/Decision Coverage code coverage. Check out Bitmap RFC for the design.

Names

This section contains possibly compressed concatenated string of functions’ PGO names. If compressed, zlib library is used.

Function names serve as keys in the PGO data hash table when raw profiles are converted into indexed profiles. They are also crucial for llvm-profdata to show the profiles in a human-readable way.

Value Profile Data

This section contains the profile data for value profiling.

The value profiles corresponding to a profile metadata are serialized contiguously as one record, and value profile records are stored in the same order as the respective profile data, such that a raw profile reader advances the pointer to profile data and the pointer to value profile records simutaneously [5] to find value profiles for a per function, per FuncHash profile data.

Indexed Profile Format

Indexed profiles are generated from llvm-profdata. In the indexed profiles, function data are organized as on-disk hash table such that compilers can look up profile data for functions in an IR module.

Compilers and tools must retain backward compatibility with indexed profiles. That is, a tool or a compiler built at newer versions of code must understand profiles generated by older tools or compilers.

General Storage Layout

                +-----------------------+---+
                |        Magic          |   |
                +-----------------------+   |
                |        Version        |   |
                +-----------------------+   |
                |        HashType       |   H
                +-----------------------+   E
        +-------|       HashOffset      |   A
        |       +-----------------------+   D
    +-----------|     MemProfOffset     |   E
    |   |       +-----------------------+   R
    |   |    +--|     BinaryIdOffset    |   |
    |   |    |  +-----------------------+   |
+---------------|      TemporalProf-    |   |
|   |   |    |  |      TracesOffset     |   |
|   |   |    |  +-----------------------+---+
|   |   |    |  |   Profile Summary     |   |
|   |   |    |  +-----------------------+   P
|   |   +------>|    Function data      |   A
|   |        |  +-----------------------+   Y
|   +---------->|  MemProf profile data |   L
|            |  +-----------------------+   O
|            +->|    Binary Ids         |   A
|               +-----------------------+   D
+-------------->|  Temporal profiles    |   |
                +-----------------------+---+

Header

Magic

The purpose of the magic number is to be able to tell if the profile is an indexed profile.

Version

Similar to raw profile version, the lower 32 bits specify the version of the indexed profile and the most significant 32 bits are reserved to specify the variant types of the profile.

HashType

The hashing scheme for on-disk hash table keys. Only MD5 hashing is used as of writing.

HashOffset

An on-disk hash table stores the per-function profile records. This field records the offset of this hash table’s metadata (i.e., the number of buckets and entries), which follows right after the payload of the entire hash table.

MemProfOffset

Records the byte offset of MemProf profiling data.

BinaryIdOffset

Records the byte offset of binary id sections.

TemporalProfTracesOffset

Records the byte offset of temporal profiles.

Payload Sections

(CS) Profile Summary

This section is right after profile header. It stores the serialized profile summary. For context-sensitive IR-based instrumentation PGO, this section stores an additional profile summary corresponding to the context-sensitive profiles.

Function data

This section stores functions and their profiling data as an on-disk hash table. Profile data for functions with the same name are grouped together and share one hash table entry (the functions may come from different shared libraries for instance). The profile data for them are organized as a sequence of key-value pair where the key is FuncHash, and the value is profiled information (represented by InstrProfRecord) for the function.

MemProf Profile data

This section stores function’s memory profiling data. See MemProf binary serialization format RFC for the design.

Binary Ids

The section is used to carry on binary id information from raw profiles.

Temporal Profile Traces

The section is used to carry on temporal profile information from raw profiles. See temporal profiling for the design.

Profile Data Usage

llvm-profdata is the command line tool to display and process instrumentation- based profile data. For supported usages, check out llvm-profdata documentation.