====== SPO600 2024 Fall Project ======
===== Goal ====
The goal of this project is to write a functioning proof-of-concept prototype for the function-pruning component of Automatic Function Multi-Versioning (AFMV) capability to the Gnu Compiler Collection (GCC) for AArch64 systems, building on previous work.
===== The Problem =====
We're attempting to add automatic function multi-versioning (AFMV) to the GCC compiler. This builds upon other capabilities within the compiler:
1. **IFUNC** - The ifunc capability allows a program to provide multiple implementations of a function, and to use a "resolver function" which is run once at program initialization to determine which implementation will be used. The resolver function returns a pointer to the selected function, which is used from that point on as for the life of the process. This capability is very flexible but requires the programmer to create:
* multiple implementations of the desired function with different names
* the resolver function (which can select between the implementations based on any criteria, but usually selects based on the hardware capabilities of the runtime system)
* a prototype that ties together the desired target name and the resolver function
2. **FMV** - The GCC compiler includes a function multiversioning capability for x86_32, x86_64, powerpc, and aarch64 architectures (with slightly different implementations). FMV is similar to ifunc, and can be used in two different ways:
* With manually-written functions:
* each function has the same name, and an attribute which specifies which architectural variant that function version is to be used on
* the resolver function is provided automatically by the compiler
* With function cloning:
* only one version of the function is provided, and an attribute specifies the list of architectural variants. The function is automatically cloned by the complier with one function clone for each architectural variant, and each clone is optimized for a specific variant.
* the resolver function is provided automatically by the compiler
This requires fewer code changes than ifunc, but still requires that the programmer state the architectural variants that will be targetted. The programmer also needs to know (or guess!) which functions would benefit from multiversioning.
3. **AFMV** - This "automatic function multi-versioning" capability does not exist yet, and is what we're working towards building. AFMV should work like FMV function cloning, but without any source code changes; instead, a compiler option will be used to specify the architectural variants of interest, and any function that would benefit from function multi-versioning will be automatically cloned. It is proposed that AFMV operate in this fashion:
* all functions will be cloned and subject to the compiler's optimization process
* any function clones which are fundamentally the same after optimization will be pruned back to a single implementation
Our project this semester is to determine how to detect when two different versions of a function are "fundamentally the same".
We're going to do this work within the GCC compiler.
===== Project Stage 1: Preparation =====
**Note: the Project Stage 1 requirements were changed on Oct 31 due to issues with the available server systems.**
1. **Become familiar with the GCC build process.** Build the current development version of GCC on x86_64 and aarch64 platforms. Get to know how long a full build takes, how to change the build options, and how to install a local (non-system, personal) copy of GCC. **Make sure that you are __very__ comfortable with the build process.**
Note: you do __not__ have to produce a full bootstrap build.
1a. Become familiar with producing and reviewing dumps produced during the compilation passes (see the ''-fdump-tree-all'' and ''-fdump-tree-//pass//'' compiler options and the associated code, as well as ''-fdump-rtl-all'' and ''-fdump-rtl-//pass//''). Important: you do not (and should not) enable these options during the compilation of the GCC compiler itself, but you should test out using them to compile other (smaller!) programs.
2. **Learn how to navigate the GCC codebase.** Specifically, find out what code implements these aspects of the compiler and how to add to or change the code:
a. Find the code that controls the compilation passes, and how passes can be added. Test out adding a pass (even if just a dummy pass that produces a confirmation that it's running).
b. Find out how to access the intermediate representation (IR) of the program being compiled. Ideally, experiment with using the accessor macros (see the GCC documentation) to access and perhaps iterate through the IR. You could, for example, create a pass that performs a scan for all of the function names, and identifies all of the clones of a function.
c. Find out how dumps are produced during the compilation passes (see the ''-fdump-tree-all'' and ''-fdump-tree-//pass//'' compiler options and the associated code, as well as ''-fdump-rtl-all'' and ''-fdump-rtl-//pass//''). Become familiar with producing and reviewing these dumps. Consider creating a dummy pass that produces a useful diagnostic dump, or add a dump to the an IR scanning pass.
==== Submitting your Project Stage 1 ====
Blog your results:
* Include detailed results for the items above. Be specific and conclusive in your reporting, and include detail such as build options and build time, specific files and directories identified as you learned to navigate the code, and the specific code used in your experimentation.
* Enable replication of your results. For example, you could provide links to specific content in a Git repository of your experiments. Avoid presenting code as screenshots whenever possible, because screenshots are not searchable, indexable, testable, nor accessible.
* Add your reflections on the experience - what you learned, what you found interesting, what you found challenging, and what gaps you have identified in your knowledge and how you can address those gaps.
* I recommend that you blog about your work in multiple sections - blog as you go rather than waiting and writing one massive blog post at the end of each stage.
==== Resources ====
* [[Building GCC]] guide page on this wiki
* [[https://gcc.gnu.org/onlinedocs/|GCC Documentation]]
* See especially:
* The [[https://gcc.gnu.org/onlinedocs/gcc-4.5.0/gccint/|GCC Internals Manual]]
* Take particular note of the section on Contributing to GCC.
* [[https://gcc.gnu.org/install/|Installing GCC]], the GCC project's guide to compiling and installing the GCC compiler from source code
* [[https://gcc.gnu.org/git.html|GCC Git]]
* Note that GCC is [[https://github.com/gcc-mirror/gcc|mirrored]] onto GitHub, but the main activity is conducted in the GCC git repository documented in the link above. The GitHub mirror is provided for the convenience of GitHub users, but the GitHub workflow (including approaches such as forking and pull requests) is not used by the GCC project; instead, they discuss and review patches through an email-based workflow.
=== Specifics: IFUNC ===
GCC IFUNC documenation:
* [[https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Function-Attributes.html#index-g_t_0040code_007bifunc_007d-function-attribute-3095|GCC Manual]]
* [[https://sourceware.org/glibc/wiki/GNU_IFUNC|GCC Wiki]]
=== Specifics: FMV ===
Current documentation:
1. [[https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html|GCC documentation]]
* Mentions that FMV is only implemented for i386 (aka x86_32) - now false as it's also implemented for x86_64, Power (PPC64), and aarch64
* Does not mention ''target_clones'' syntax
2. [[https://arm-software.github.io/acle/main/acle.html#function-multi-versioning|ARM ACLE documentation]]
* Does not talk about the current state of implementation
* Mentions that FMV may be disabled at compile time by a compiler flag, but this flag is not documented and does not appear to be implemented
* The macro ''__HAVE_FEATURE_MULTI_VERSIONING'' (or ''__FEATURE_FUNCTION_MULTI_VERSIONING'' or ''__ARM_FEATURE_FUNCTION_MULTIVERSIONING'') does not appear to be defined (as of GCC 14.2.1)
Implementation in GCC
* Implemented and tested in (at least) x86_64, PowerPC4, and AArch64
* I did not test the PowerPC version
* Testing performed with GCC 14.0.1 20240223 with limited testing on 14.2.1 20240912.
* On x86:
* Syntax to manually specify a function target: ''__attribue__((target("nnn")))'' - where ''nnn'' may take the form of "default", or "feature" eg., "sse4.2", or "feature,feature" e.g., "sse4.2,avx2", or it may take the form "arch=archlevel" e.g., "arch=x86-64-v3" or "arch=atom"
* target_version is not accepted as an alternative to target attribute
* Syntax to manually specify cloning: ''__attribute__((target_clone("nnn1", "nnn2" [...])))''
* Works in both the C and C++ frontends
* On AArch64:
* Current support landed Dec 16, 2023; see commit 0cfde688e21 (and the commit messages) in the GCC Git repository (https://gcc.gnu.org/git.html) or GitHub read-only mirror of that repository (https://github.com/gcc-mirror/gcc), or the corresponding discussion on the [[https://gcc.gnu.org/pipermail/gcc-patches/|gcc-patches mailing list]]. There have been several updates and enhancements since it first landed.
* Syntax to manually specify a function target: ''__attribute__((target_version("nnn")))'' - where ''nnn'' may take the form of "default", or "feature" e.g., "sve", or "feature+feature" e.g., "sve+sve2" (Note: in some earlier versions of GCC, a plus-sign was required at the start of the feature list, e.g., "+sve" instead of "sve". This was changed by gcc 14). Note the use of the attribute ''target_version'' as opposed to ''target'' (as used on x86) which is compliant with the ACLE specification; it appears possible to use ''target'' in some versions of the GCC compiler (apparently with the plus-sign at the start of the feature-list?). Note that the "arch=nnn" format is not supported (and probably should be).
* Syntax to manually specify cloning: ''__attribute__((target_clone("nnn", "nnn" [...])))'' - note that contrary to the ACLE documentation, there is no automatic "default" argument - the first argument supplied should be "default"
* Manually specified function target (initially) works in the C++ frontend only, but automatic cloning appears to work in both C and C++. Note that most C code can be compiled with the C++ frontend, except for some recent C enhancements not understood by C++ as well as some C++ keywords that are not reserved in C
==== Due Date ====
* Stage 1 is due with the second batch of blog posts on November 2 3, 2024.
===== Project Stage 2: Clone-Pruning Analysis Pass =====
Create a pass for the GCC compiler which analyzes the program being compiled and:
(a) Identifies one or more functions which have been cloned;
(b) Examines the cloned functions to determine if they are substantially the same or different;
(c) Emits a message in the GCC diagnostic dump for the pass that indicates if the functions should be pruned (in the case that they're substantially the same) or not pruned (if they are different).
It is recommended that you proceed in steps:
* Create a basic dummy pass with a diagnostic dump
* Add logic to iterate through the code in the program being compiled
* Add the logic to find the cloned function(s)
* Add the locic to compare the gimple representation of the funtion(s)
* Add the code to output a decision on whether the functions should or should not be pruned
To limit complexity, you may make these assumptions:
- There is only one cloned function in a program
- There are only two versions (clones) of that function (ignoring the function resolver)
It is important that you position your compiler pass __late__ in the compilation/optimization process so that any significant optimizations, such as vectorization, are performed before your analysis. Ideally, it should be one of the last "tree" (gimple) passes performed.
Note that the gimple code for two identical functions may have slight variations. For example, the names of temporary variables will probably be different (because they are sequentially numbered), and generated labels in the code will probably be different (for the same reason). However, these variations by themselves should not be considered to make the function clones different.
Two possible approaches to this problem are (1) to iterate through the statements in each function, comparing them statement-by-statement; or (2) generating some type of hash or signature that uniquely identifies the implementation of the function and which can be compared to the hash/signature of a clone to see if they are different.
Please use these specific strings in your dump file:
* ''PRUNE: //name of base function//''
* ''NOPRUNE: //name of base function//''
Where //name of base function// is the original name of the function that should (or should not) be pruned.
Your solution should build and execute successfully on both x86_64 and aarch64 systems, and should take into account the differences between the FMV implementations on those two architectures (for example, the munging algorithm used to create the suffixes for the cloned functions is different).
==== Demo Files for Creating a GCC Pass ====
Each of the [[SPO600 Servers]] has a file ''/public/spo600-gcc-pass-demo.tgz'' which is a tar archive containing modified versions of four files from the current (2024-11-20) GCC development head.
These files are all from the ''gcc'' subdirectory in the source tree:
* passes.def - One line has been added: ''NEXT_PASS (pass_ctyler);''
* tree-pass.h - One line has been added: ''extern gimple_opt_passs *make_pass_ctyler (gcc::context *ctxt);''
* tree-ctyler.cc - The actual pass code, loosly modelled on ''tree-nrv.cc''
* Makefile.in - One line has been added to the OBJS definition: ''tree-ctyler.o \''
Building GCC with these changes will result in a compiler that can output an additional dump, which can be triggered with ''-fdump-tree-ctyler'' (or ''-fdump-tree-all'').
==== Test Cases for Pruning/No-Pruning ====
Each of the [[SPO600 Servers]] has a file ''/public/spo600-test-clone.tgz'' which is a tar archive containing code to build test cases on x86_64 or aarch64 systems. On each architecture, two binaries will be built, each containing one cloned function. Building these binaries with a copy of GCC that contains your analysis pass should result in a decision to prune (for the binary ''test-clone-//arch//-prune'') or not to prune (for the binary ''test-clone-//arch//-noprune''), where ''//arch//'' is either ''x86'' or ''aarch64''.
Refer to the README.txt file within the tgz file for more detail.
==== Recommendations for Building GCC in Stage 2 ====
A reminder that the ''make'' utility will rebuild a codebase in as few steps as possible. It does this by comparing the timestamps of the dependencies (inputs) for each target (output) to determine which source (or other input files) have changed since the related targets were built, and then rebuilding only those targets.
This can effectively cut the build time for a complex project like GCC from hours to minutes. On my development system (a Ryzen 7735HS with 32 GB RAM), a null rebuild (no source changes - make is checking that everything is up-to-date) takes about 8.3 seconds, and a rebuild with edits to one pass source file take 23-30 seconds. On the [[SPO600 Servers]] the rebuild times are similar.
To take advantage of this capability, do an initial full build of GCC in a separate build directory as usual, then make whatever required edits to the source code in the source directory. Run ''make'' with appropriate options (including ''-j'' job values) in the build directory.
Remember to use [[Screen Tutorial|screen]] (or a similar program such as tmux) when building on remote systems in case your network connection gets interrupted, and it's a good idea to time every build (prepend ''time'' to your ''make'' command) and redirect both stdout and stderr to a log file: ''time make ... |& tee build.log'' if you also want to see the output on the terminal or ''time make ... &> build.log'' if you don't want to see the output.
You can do your development work on either architecture, but remember to test your work on both architectures.
==== Submitting your Project Stage 2 ====
Blog your results:
* Include detailed results for the items above. Be specific and conclusive in your reporting, and include detail such as build options and build time, specific files and directories identified as you learned to navigate the code, and the specific code used in your experimentation.
* Clearly identify the capabilities and limitations of your code.
* Enable replication of your results. For example, you could provide links to specific content in a Git repository of your experiments. Avoid presenting code as screenshots whenever possible, because screenshots are not searchable, indexable, testable, nor accessible.
* Add your reflections on the experience - what you learned, what you found interesting, what you found challenging, and what gaps you have identified in your knowledge and how you can address those gaps.
* I recommend that you blog about your work in multiple sections - blog as you go rather than waiting and writing one massive blog post at the end of each stage.
==== Due Date ====
* Stage 2 is due with the second batch of blog posts on December 1, 2024.