SPO600 2024 Fall Project

SPO600 2024 Fall Project

Goal

The goal of this project is to write a functioning proof-of-concept prototype for the function-pruning component of Automatic Function Multi-Versioning (AFMV) capability to the Gnu Compiler Collection (GCC) for AArch64 systems, building on previous work.

The Problem

We're attempting to add automatic function multi-versioning (AFMV) to the GCC compiler. This builds upon other capabilities within the compiler:

1. IFUNC - The ifunc capability allows a program to provide multiple implementations of a function, and to use a “resolver function” which is run once at program initialization to determine which implementation will be used. The resolver function returns a pointer to the selected function, which is used from that point on as for the life of the process. This capability is very flexible but requires the programmer to create:

multiple implementations of the desired function with different names
the resolver function (which can select between the implementations based on any criteria, but usually selects based on the hardware capabilities of the runtime system)
a prototype that ties together the desired target name and the resolver function

2. FMV - The GCC compiler includes a function multiversioning capability for x86_32, x86_64, powerpc, and aarch64 architectures (with slightly different implementations). FMV is similar to ifunc, and can be used in two different ways:

With manually-written functions:
- each function has the same name, and an attribute which specifies which architectural variant that function version is to be used on
- the resolver function is provided automatically by the compiler

With function cloning:
- only one version of the function is provided, and an attribute specifies the list of architectural variants. The function is automatically cloned by the complier with one function clone for each architectural variant, and each clone is optimized for a specific variant.
- the resolver function is provided automatically by the compiler

This requires fewer code changes than ifunc, but still requires that the programmer state the architectural variants that will be targetted. The programmer also needs to know (or guess!) which functions would benefit from multiversioning.

3. AFMV - This “automatic function multi-versioning” capability does not exist yet, and is what we're working towards building. AFMV should work like FMV function cloning, but without any source code changes; instead, a compiler option will be used to specify the architectural variants of interest, and any function that would benefit from function multi-versioning will be automatically cloned. It is proposed that AFMV operate in this fashion:

all functions will be cloned and subject to the compiler's optimization process
any function clones which are fundamentally the same after optimization will be pruned back to a single implementation

Our project this semester is to determine how to detect when two different versions of a function are “fundamentally the same”.

We're going to do this work within the GCC compiler.

Project Stage 1: Preparation

Note: the Project Stage 1 requirements were changed on Oct 31 due to issues with the available server systems.

1. Become familiar with the GCC build process. Build the current development version of GCC on ~~x86_64 and~~ aarch64 platforms. Get to know how long a full build takes, how to change the build options, and how to install a local (non-system, personal) copy of GCC. Make sure that you are very comfortable with the build process.

Note: you do not have to produce a full bootstrap build.

1a. Become familiar with producing and reviewing dumps produced during the compilation passes (see the -fdump-tree-all and -fdump-tree-pass compiler options and the associated code, as well as -fdump-rtl-all and -fdump-rtl-pass). Important: you do not (and should not) enable these options during the compilation of the GCC compiler itself, but you should test out using them to compile other (smaller!) programs.

2. Learn how to navigate the GCC codebase. Specifically, find out what code implements these aspects of the compiler and how to add to or change the code: a. Find the code that controls the compilation passes, and how passes can be added. Test out adding a pass (even if just a dummy pass that produces a confirmation that it's running). b. Find out how to access the intermediate representation (IR) of the program being compiled. Ideally, experiment with using the accessor macros (see the GCC documentation) to access and perhaps iterate through the IR. You could, for example, create a pass that performs a scan for all of the function names, and identifies all of the clones of a function. c. Find out how dumps are produced during the compilation passes (see the -fdump-tree-all and -fdump-tree-pass compiler options and the associated code, as well as -fdump-rtl-all and -fdump-rtl-pass). Become familiar with producing and reviewing these dumps. Consider creating a dummy pass that produces a useful diagnostic dump, or add a dump to the an IR scanning pass.

Submitting your Project Stage 1

Blog your results:

Include detailed results for the items above. Be specific and conclusive in your reporting, and include detail such as build options and build time, specific files and directories identified as you learned to navigate the code, and the specific code used in your experimentation.
Enable replication of your results. For example, you could provide links to specific content in a Git repository of your experiments. Avoid presenting code as screenshots whenever possible, because screenshots are not searchable, indexable, testable, nor accessible.
Add your reflections on the experience - what you learned, what you found interesting, what you found challenging, and what gaps you have identified in your knowledge and how you can address those gaps.
I recommend that you blog about your work in multiple sections - blog as you go rather than waiting and writing one massive blog post at the end of each stage.

Resources

Building GCC guide page on this wiki
GCC Documentation
- See especially:
  - The GCC Internals Manual
    - Take particular note of the section on Contributing to GCC.
  - Installing GCC, the GCC project's guide to compiling and installing the GCC compiler from source code
GCC Git
- Note that GCC is mirrored onto GitHub, but the main activity is conducted in the GCC git repository documented in the link above. The GitHub mirror is provided for the convenience of GitHub users, but the GitHub workflow (including approaches such as forking and pull requests) is not used by the GCC project; instead, they discuss and review patches through an email-based workflow.

Specifics: IFUNC

GCC IFUNC documenation:

Specifics: FMV

Current documentation:

1. GCC documentation

Mentions that FMV is only implemented for i386 (aka x86_32) - now false as it's also implemented for x86_64, Power (PPC64), and aarch64
Does not mention target_clones syntax

2. ARM ACLE documentation

Does not talk about the current state of implementation
Mentions that FMV may be disabled at compile time by a compiler flag, but this flag is not documented and does not appear to be implemented
The macro __HAVE_FEATURE_MULTI_VERSIONING (or __FEATURE_FUNCTION_MULTI_VERSIONING or __ARM_FEATURE_FUNCTION_MULTIVERSIONING) does not appear to be defined (as of GCC 14.2.1)

Implementation in GCC

Implemented and tested in (at least) x86_64, PowerPC4, and AArch64
I did not test the PowerPC version
Testing performed with GCC 14.0.1 20240223 with limited testing on 14.2.1 20240912.

On x86:
- Syntax to manually specify a function target: __attribue__((target("nnn"))) - where nnn may take the form of “default”, or “feature” eg., “sse4.2”, or “feature,feature” e.g., “sse4.2,avx2”, or it may take the form “arch=archlevel” e.g., “arch=x86-64-v3” or “arch=atom”
- target_version is not accepted as an alternative to target attribute
- Syntax to manually specify cloning: __attribute__((target_clone("nnn1", "nnn2" [...])))
- Works in both the C and C++ frontends

On AArch64:
- Current support landed Dec 16, 2023; see commit 0cfde688e21 (and the commit messages) in the GCC Git repository (https://gcc.gnu.org/git.html) or GitHub read-only mirror of that repository (https://github.com/gcc-mirror/gcc), or the corresponding discussion on the gcc-patches mailing list. There have been several updates and enhancements since it first landed.
  - Syntax to manually specify a function target: __attribute__((target_version("nnn"))) - where nnn may take the form of “default”, or “feature” e.g., “sve”, or “feature+feature” e.g., “sve+sve2” (Note: in some earlier versions of GCC, a plus-sign was required at the start of the feature list, e.g., “+sve” instead of “sve”. This was changed by gcc 14). Note the use of the attribute target_version as opposed to target (as used on x86) which is compliant with the ACLE specification; it appears possible to use target in some versions of the GCC compiler (apparently with the plus-sign at the start of the feature-list?). Note that the “arch=nnn” format is not supported (and probably should be).
  - Syntax to manually specify cloning: __attribute__((target_clone("nnn", "nnn" [...]))) - note that contrary to the ACLE documentation, there is no automatic “default” argument - the first argument supplied should be “default”
  - Manually specified function target (initially) works in the C++ frontend only, but automatic cloning appears to work in both C and C++. Note that most C code can be compiled with the C++ frontend, except for some recent C enhancements not understood by C++ as well as some C++ keywords that are not reserved in C

Due Date

Stage 1 is due with the second batch of blog posts on November 2 3, 2024.

Project Stage 2: Clone-Pruning Analysis Pass

Create a pass for the GCC compiler which analyzes the program being compiled and:

Identifies one or more functions which have been cloned;
Examines the cloned functions to determine if they are substantially the same or different;
Emits a message in the GCC diagnostic dump for the pass that indicates if the functions should be pruned (in the case that they're substantially the same) or not pruned (if they are different).

It is recommended that you proceed in steps:

Create a basic dummy pass with a diagnostic dump
Add logic to iterate through the code in the program being compiled
Add the logic to find the cloned function(s)
Add the locic to compare the gimple representation of the funtion(s)
Add the code to output a decision on whether the functions should or should not be pruned

To limit complexity, you may make these assumptions:

There is only one cloned function in a program
There are only two versions (clones) of that function (ignoring the function resolver)

It is important that you position your compiler pass late in the compilation/optimization process so that any significant optimizations, such as vectorization, are performed before your analysis. Ideally, it should be one of the last “tree” (gimple) passes performed.

Note that the gimple code for two identical functions may have slight variations. For example, the names of temporary variables will probably be different (because they are sequentially numbered), and generated labels in the code will probably be different (for the same reason). However, these variations by themselves should not be considered to make the function clones different.

Two possible approaches to this problem are (1) to iterate through the statements in each function, comparing them statement-by-statement; or (2) generating some type of hash or signature that uniquely identifies the implementation of the function and which can be compared to the hash/signature of a clone to see if they are different.

Please use these specific strings in your dump file:

PRUNE: name of base function
NOPRUNE: name of base function

Where name of base function is the original name of the function that should (or should not) be pruned.

Your solution should build and execute successfully on both x86_64 and aarch64 systems, and should take into account the differences between the FMV implementations on those two architectures (for example, the munging algorithm used to create the suffixes for the cloned functions is different).

Demo Files for Creating a GCC Pass

Each of the SPO600 Servers has two files /public/spo600-gcc-pass-demo.tgz and spo600-gcc-pass-demo-2.tgz – each is a tar archive containing modified versions of four files from the current (2024-11-20) GCC development head.

These files are all from the gcc subdirectory in the source tree:

passes.def - One line has been added: NEXT_PASS (pass_ctyler);
tree-pass.h - One line has been added: extern gimple_opt_passs *make_pass_ctyler (gcc::context *ctxt);
tree-ctyler.cc - The actual pass code, loosly modelled on tree-nrv.cc - this is the file that is different between the two demo archives
Makefile.in - One line has been added to the OBJS definition: tree-ctyler.o \

Building GCC with these changes will result in a compiler that can output an additional dump, which can be triggered with -fdump-tree-ctyler (or -fdump-tree-all).

Test Cases for Pruning/No-Pruning

Each of the SPO600 Servers has a file /public/spo600-test-clone.tgz which is a tar archive containing code to build test cases on x86_64 or aarch64 systems. On each architecture, two binaries will be built, each containing one cloned function. Building these binaries with a copy of GCC that contains your analysis pass should result in a decision to prune (for the binary test-clone-arch-prune) or not to prune (for the binary test-clone-arch-noprune), where arch is either x86 or aarch64.

Refer to the README.txt file within the tgz file for more detail.

Recommendations for Building GCC in Stage 2

A reminder that the make utility will rebuild a codebase in as few steps as possible. It does this by comparing the timestamps of the dependencies (inputs) for each target (output) to determine which source (or other input files) have changed since the related targets were built, and then rebuilding only those targets.

This can effectively cut the build time for a complex project like GCC from hours to minutes. On my development system (a Ryzen 7735HS with 32 GB RAM), a null rebuild (no source changes - make is checking that everything is up-to-date) takes about 8.3 seconds, and a rebuild with edits to one pass source file take 23-30 seconds. On the SPO600 Servers the rebuild times are similar.

To take advantage of this capability, do an initial full build of GCC in a separate build directory as usual, then make whatever required edits to the source code in the source directory. Run make with appropriate options (including -j job values) in the build directory.

Remember to use screen (or a similar program such as tmux) when building on remote systems in case your network connection gets interrupted, and it's a good idea to time every build (prepend time to your make command) and redirect both stdout and stderr to a log file: time make … |& tee build.log if you also want to see the output on the terminal or time make … &> build.log if you don't want to see the output.

You can do your development work on either architecture, but remember to test your work on both architectures.

Submitting your Project Stage 2

Blog your results:

Include detailed results for the items above. Be specific and conclusive in your reporting, and include detail such as build options and build time, specific files and directories identified as you learned to navigate the code, and the specific code used in your experimentation.
Clearly identify the capabilities and limitations of your code.
Enable replication of your results. For example, you could provide links to specific content in a Git repository of your experiments. Avoid presenting code as screenshots whenever possible, because screenshots are not searchable, indexable, testable, nor accessible.
Add your reflections on the experience - what you learned, what you found interesting, what you found challenging, and what gaps you have identified in your knowledge and how you can address those gaps.
I recommend that you blog about your work in multiple sections - blog as you go rather than waiting and writing one massive blog post at the end of each stage.

Due Date

Stage 2 is due with the third batch of blog posts on December <s>1</s> 5, 2024.

Project Stage 3: Tidy & Wrap

Bring your work to a solid conclusion. Take into account feedback received for Stage 2 and provide any documentation that is missing, incomplete, or unclear, as well as easily-testable code clearly showing your work.

Tidy up any loose ends from your Stage 2 work. If you have an incomplete experiment or investigation that you can complete, include it in your Stage 3.
Provide the source code for the latest version of your project in a form which permits another person to examine, build, and test the changes which you have made. This can be done in any of several different ways (in decreasing order of preference):
- Provide a public git repository from which your code can be pulled. This should be a clone of a base repository with your changes applied as one or more commits. Clearly document how to access this repository, and which branch(es) and commit(s) are of interest.
- Or, provide a set of patch files containing your changes. The easiest way to generate this is with the 'git format-patch' command. Clearly document which upstream sources (e.g., GNU git, GitHub mirror, or another upstream such as the Darwin fork) and which commit number on the upstream server the patches should be applied against.
- Alternatively, provide a tar or zip archive that can be clealy applied to the upsteam sources which provides your updates (like the demo files in /public). Clearly document which upstream sources and which commit number the archive should be applied against.
- Test the access to your code (make sure that someone who is not you can access the source).
Clearly show the state of your work: what you attempted to do, what works and what does not, and what the output (e.g., dump files) look like and how that output should be interpreted.
Test your code with the Test Cases for Pruning/No-Pruning and document the results.
Describe the problems you encountered while writing your code and how you addressed those problems.
Describe the next steps that could/should be taken to continue the investigation.
Write up your specific reflections on the project, if there are updates since Stage 2.

Important note: some students did not demonstrate that they wrote any code for Stage 2. In order to successfully complete stage 3, you must show that you made some reasonable effort towand the goal of implementing a Clone-Pruning Analysis Pass. This must include some experimentation beyond the preliminary steps of building GCC and/or building the demo pass provided to you.

Due Date

Stage 3 is due with the third batch of blog posts on December 11, 2024 (11:59 pm).

Table of Contents

SPO600 2024 Fall Project

Goal

The Problem

Project Stage 1: Preparation

Submitting your Project Stage 1

Resources

Specifics: IFUNC

Specifics: FMV

Due Date

Project Stage 2: Clone-Pruning Analysis Pass

Demo Files for Creating a GCC Pass

Test Cases for Pruning/No-Pruning

Recommendations for Building GCC in Stage 2

Submitting your Project Stage 2

Due Date

Project Stage 3: Tidy & Wrap

Due Date