This is an old revision of the document!

SPO600 2025 Winter Project

Before Starting

Before starting this project, please perform Lab 4.

Project Stage 1: Create a Basic GCC Pass

Create a pass for the current development version of the GCC compiler which:

Iterates through the code being compiled;
Prints the name of every function being compiled;
Prints a count of the number of basic blocks in each function; and
Prints a count of the number of gimple statements in each function.

Your code must build on both of the SPO600 Servers.

It is recommended that you proceed in steps:

Create a basic dummy pass with a diagnostic dump
Add logic to iterate through the code in the program being compiled
Incrementally add logic to count the basic blocks and gimple statements

It is recommended that you position your compiler pass late in the compilation/optimization process.

Resources

Recommendations for Building GCC

A reminder that the make utility will rebuild a codebase in as few steps as possible. It does this by comparing the timestamps of the dependencies (inputs) for each target (output) to determine which source (or other input files) have changed since the related targets were built, and then rebuilding only those targets.

This can effectively cut the build time for a complex project like GCC from hours to minutes. On my development system (a Ryzen 7735HS with 32 GB RAM), a null rebuild (no source changes - make is checking that everything is up-to-date) takes about 8.3 seconds, and a rebuild with edits to one pass source file take 23-30 seconds. On the SPO600 Servers the rebuild times are similar.

To take advantage of this capability, do an initial full build of GCC in a separate build directory as usual, then make whatever required edits to the source code in the source directory. Run make with appropriate options (including -j job values) in the build directory.

Remember to use screen (or a similar program such as tmux) when building on remote systems in case your network connection gets interrupted, and it's a good idea to time every build (prepend time to your make command) and redirect both stdout and stderr to a log file: time make … |& tee build.log if you also want to see the output on the terminal or time make … &> build.log if you don't want to see the output.

You can do your development work on either architecture, but remember to test your work on both architectures.

Submitting your Project Stage 1

Blog about your process and results:

Include detailed results for the items above. Be specific and conclusive in your reporting, and include detail such as build options and build time, specific files and directories identified as you learned to navigate the code, and the specific code used in your experimentation.
Clearly identify the capabilities and limitations of your code.
Enable replication of your results. For example, you could provide links to specific content in a Git repository of your experiments. Avoid presenting code as screenshots whenever possible, because screenshots are not searchable, indexable, testable, nor accessible. It must be possible to easily test your code.
Add your reflections on the experience - what you learned, what you found interesting, what you found challenging, and what gaps you have identified in your knowledge and how you can address those gaps.
I recommend that you blog about your work in multiple sections - blog as you go rather than waiting and writing one massive blog post at the end of each stage.
Assuming that the basic work is done well, extending your Stage 1 work with particularly well-formatted dump text or additional detail in the output could improve your mark.

Due Date

Stage 1 is due with the second batch of blog posts ~~on March 9, 2025~~ by 8 am on March 10, 2025.
This stage of the project is worth 15% of the course total.

Project Stage 2: Clone-Pruning Analysis Pass

Create a pass for the GCC compiler which analyzes the program being compiled and:

Identifies one or more functions which have been cloned. These functions will have the name function.variant where the function portion is the same, and there will be a corresponding resolver named function.resolver.
Examines the cloned functions to determine if they are substantially the same or different. “Substantially the same” means that they are identical, with the possible exception of identifiers such as temporary and single static assignment (SSA) variable names, labels, and basic block numbers. For example, two cloned functions may have two different names for the first declared integer variable, but the corresponding variables will appear at exactly the same points in the two functions and are therefore equivalent.
Emit a message in the GCC diagnostic dump for the pass that indicates if the functions should be pruned (in the case that they're substantially the same) or not pruned (if they are different). The diagnostic dump may contain other information.

It is recommended that you proceed in steps:

Start with your code from Stage I
Add the logic to find the cloned function(s)
Add the locic to compare the gimple representation of the funtion(s)
Add the code to output a decision on whether the functions should or should not be pruned

To limit complexity, you may make these assumptions:

There is only one cloned function in a program
There are only two versions (clones) of that function (ignoring the function resolver)

However, if you choose to handle multiple cloned functions, or more than two clones, that would be a welcome enhancement!

It is important that you position your compiler pass late in the compilation/optimization process so that any significant optimizations, such as vectorization, are performed before your analysis. Ideally, it should be one of the last “tree” (gimple) passes performed.

Two possible approaches to this problem are (1) to iterate through the statements in each function, comparing them statement-by-statement; or (2) generating some type of hash or signature that uniquely identifies the implementation of the function and which can be compared to the hash/signature of a clone to see if they are different. In either case, you need to accomodate the variation in variable, label, and basic block names.

Please output one of these specific strings in your dump file, each on its own line, based whether the cloned functions are the same (PRUNE) or different (NOPRUNE):

PRUNE: name of base function
NOPRUNE: name of base function

Where name of base function is the original name of the function that should (or should not) be pruned.

Your solution should build and execute successfully on both x86_64 and aarch64 systems, and should take into account the differences between the FMV implementations on those two architectures (for example, the munging algorithm used to create the suffixes for the cloned functions is different).

Test Cases for Pruning/No-Pruning

Each of the SPO600 Servers has a file /public/spo600-test-clone.tgz which is a tar archive containing code to build test cases on x86_64 or aarch64 systems. On each architecture, two binaries will be built, each containing one cloned function. Building these binaries with a copy of GCC that contains your analysis pass should result in a decision to prune (for the binary test-clone-arch-prune) or not to prune (for the binary test-clone-arch-noprune), where arch is either x86 or aarch64.

Refer to the README.txt file within the tgz file for more detail.

Your code must be able to correctly output PRUNE or NOPRUNE messages for the test programs on each platform.

Submitting your Project Stage 2

Blog your results:

Include detailed results for the items above. Be specific, detailed, and conclusive in your reporting.
Clearly identify the capabilities and limitations of your code.
Enable easy replication of your results. For example, you could provide links to specific content in a Git repository of your experiments. Avoid presenting code as screenshots whenever possible, because screenshots are not searchable, indexable, testable, nor accessible. Your code must be easily testable.
Add your reflections on the experience - what you learned, what you found interesting, what you found challenging, and what gaps you have identified in your knowledge and how you can address those gaps.
Identify technical issues and improvements you would like to work on in Stage III of your project.
I recommend that you blog about your work in multiple sections - blog as you go rather than waiting and writing one massive blog post at the end of each stage.

Due Date

Stage 2 is due with the third batch of blog posts on March 6, 2024.
This stage of the project is worth 15% of the course total.

Chris Tyler's Wiki

Table of Contents

SPO600 2025 Winter Project

Before Starting

Project Stage 1: Create a Basic GCC Pass

Resources

Recommendations for Building GCC

Submitting your Project Stage 1

Due Date

Project Stage 2: Clone-Pruning Analysis Pass

Test Cases for Pruning/No-Pruning

Submitting your Project Stage 2

Due Date