====== SPO600 ======
This is the course schedule for SPO600 in Winter 2024. It may be adjusted according to the needs of the participants and changes in standards and technology.
Each topic will be linked to relevant notes as the course proceeds.
^ Week ^ Week of... ^ Class I (Tuesday) ^ Class II (Friday) ^ Deliverables ^
|Week 1|January 8|[[#Week 1 - Class I|Portability and Optimization - Introduction to the Problem]]|[[#Week 1 - Class II|Assembly: 6502 - Basics]]|[[#Week 1 Deliverables|Set up communication tools and perform Lab 1]]|
|Week 2|January 15|[[#Week 2 - Class I|Compilers: Standard Optimizations, Feature Flags]]|[[#Week 2 - Class II|Assembly: 6502 - Math and Jumps, Branches, and Procedures]]|[[#Week 2 Deliverables|Finish Lab 1]]|
|Week 3|January 22|[[#Week 3 - Class I|Compilers: Architecture Targets and Tuning]]|[[#Week 3 - Class II|Assembly: 6502 Strings]]|[[#Week 3 Deliverables|Work on Lab 2]]|
|Week 4|January 29|[[#Week 4 - Class I|Compilers: iFunc]]|[[#Week 4 - Class II|Assembly: Introduction to 64 Bit]]|[[#Week 4 Deliverables|Lab 2; January blog posts; Server access]]|
|Week 5|February 5|[[#Week 5 - Class I|64 Bit Assembly, Continued]]|[[#Week 5 - Class II|SIMD]]|[[#Week 5 Deliverables|Lab 3]]|
|Week 6|February 12|[[#Week 6 - Class I|SVE/SVE2, iFunc]]||[[#Week 6 Deliverables|Continue Blogging]]|
|Week 7|February 19|[[#Week 7|Building Large Software]]||[[#Weel 7 Deliverables|Continue Blogging]]|
|Reading Week|February 26|Reading Week - No Classes|||
|Week 8|March 4|[[#Week 8 - Class I|Project Stage 1 Introduction]]|[[#Week 8 - Class II|Project Stage 1 Discussion]]|[[#Week 8 Deliverables|Blog about Project]]|
|Week 9|March 11|[[#Week 9 - Class I|Project Stage 1 Discussion]]|[[#Week 9 - Class II|Project Stage 1 Discussion]]|[[#Week 9 Deliverables|Project Stage 1 Due]]|
|Week 10|March 18|[[#Week 10 - Class I|Project Stage - Task Selection]]|[[#Week 10 - Class II|Project Stage 2 Discussion]]|[[#Week 10 Deliverables|Blog about project]]|
|Week 11|March 25|[[#Week 11 - Class I|Project Review]]|Good Friday - No Class|[[#Week 11 Deliverables|Blog about project]]|
|Week 12|April 1|[[Week 12 - Class I|Project Discussion]]|Project Discussion|Project Stage 2 Due|
|Week 13|April 8|Project Stage 3 Instructions|Project Discussion|Blog about project|
|Week 14|April 15|Project Discussin|Course Wrap-Up|Project Stage 3 Due|
===== Current Participants =====
See [[SPO600 2024 Winter Participants]]
====== Course Notes ======
Note that content is being converted from the previous wiki. There may be links to content which has not yet been converted -- these will be imported soon.
===== Week 1 =====
==== Week 1 - Class I ====
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EZ0DK53I6wFGsNHo7ClX7-EB44GLf6ofXHLavSZPInOIXg|Summary Video - January 9, 2024 class]]
=== General Course Information ===
* Course resources are linked from the CDOT wiki, starting at https://wiki.cdot.senecacollege.ca/wiki/SPO600 (Quick find: This page will usually be Google's top result for a search on "SPO600"), arranged by week and class. There will be lots of hyperlinks -- be sure to follow these links.
* Coursework is submitted by blogging. The only exception to this is quizzes.
* Quizzes will be short (~1 page) and will be held without announcement at the start of any synchronous class. There is no opportunity to re-take a missed quiz, but your lowest three quiz scores will not be counted, so do not worry if you miss one or two.
* Students with test accommodations: an alternate monthly quiz can be made available via the Test Centre. Communicate with your professor for details.
* Course marks (see Weekly Schedule for dates):
* 60% - Project Deliverables in three phases (15%, 20%, 25%)
* 20% - Communication (Blog writing, in four phases roughly a month long each, 5% each)
* 20% - Labs and Quizzes (10% labs; 10% for quizzes - lowest 3 quiz scores not counted)
== About SPO600 Classes ==
* Online Classes
* Wednesday and Friday 10:45-12:30 pm
* A summary video will be posted on a best-effort basis (technical issues may prevent posting in some cases). The summary video will be edited and may not include some discussion and questions that take place in the class. The link(s) to the video(s) will be posted on this page under the corresponding date.
* It is strongly recommended that you attend the online sessions and take notes.
* Pre-recorded Classes
* From time to time, an online class may be replaced by pre-recorded videos. The links will be provided on this page under the corresponding date.
=== Introduction to the Problems ===
== Porting and Portability ==
* Most software is written in a **high-level language** which can be compiled into [[Machine Language|machine code]] for a specific computer architecture. In many cases, this code can be compiled or interpreted for execution on multiple computer architectures - this is called 'portable' code. However, there is a lot of existing code that contains some architecture-specific code fragments which contains assumptions about the architecture, resulting in architecture-specific high-level or [[Assembly Language]] code.
* Reasons that code is architecture-specific:
* System assumptions that don't hold true on other platforms
* Variable or [[Word|word]] size
* [[Endian|Endianness]]
* Code that takes advantage of platform-specific features
* Reasons for writing code in Assembly Language include:
* Performance
* [[Atomic Operation|Atomic Operations]]
* Direct access to hardware features, e.g., CPUID registers
* Most of the historical reasons for using assembler are no longer valid. Modern compilers can out-perform most hand-optimized assembly code, atomic operations can be handled by libraries or [[Compiler Intrinsics|compiler intrinsics]], and most hardware access should be performed through the operating system or appropriate libraries.
* A new architecture has appeared: AArch64, which is part of [[http://www.arm.com/products/processors/instruction-set-architectures/armv8-architecture.php|ARMv8]]. This is the first new [[Computer Architecture|computer architecture]] to appear in several years (at least, the first mainstream computer architecture).
* At this point, most key open source software (the software typically present in a Linux distribution such as Ubuntu or Fedora, for example) now runs on AArch64. However, it may not yet be as extensively optimized as on older architectures (such as x86_64).
== Optimization ==
Optimization is the process of evaluating different ways that software can be written or built and selecting the option that has the best performance tradeoffs.
Optimization may involve substituting software algorithms, altering the sequence of operations, using architecture-specific code, or altering the build process. It is important to ensure that the optimized software produces correct results and does not cause an unacceptable performance regression for other use-cases, system configurations, operating systems, or architectures.
The definition of "performance" varies according to the target system and the operating goals. For example, in some contexts, low memory or storage usage is important; in other cases, fast operation; and in other cases, low CPU utilization or long battery life may be the most important factor. It is often possible to trade off performance in one area for another; using a lookup table, for example, can reduce CPU utilization and improve battery life in some algorithms, in return for increased memory consumption.
Most advanced compilers perform some level of optimization, and the options selected for compilation can have a significant effect on the trade-offs made by the compiler, affecting memory usage, execution speed, executable size, power consumption, and debuggability.
== Benchmarking and Profiling ==
Benchmarking involves testing software performance under controlled conditions so that the performance can be compared to other software, the same software operating on other types of computers, or so that the impact of a change to the software can be gauged.
Profiling is the process of analyzing software performance on finer scale, determining resource usage per program part (typically per function/method). This can identify software bottlenecks and potential targets for optimization. The resource utilization studies may include memory, CPU cycles/time, or power.
== Build Process ==
Building software is a complex task that many developers gloss over. The simple act of compiling a program invokes a process with five or more stages, including pre-processing, compiling, optimizing, assembling, and linking. However, a complex software system will have hundreds or even thousands of source files, as well as dozens or hundreds of build configuration options, auto configuration scripts (cmake, autotools), build scripts (such as Makefiles) to coordinate the process, test suites, and more.
The build process varies significantly between software packages. Most software distribution projects (including Linux distributions such as Ubuntu and Fedora) use a packaging system that further wraps the build process in a standardized script format, so that different software packages can be built using a consistent process.
In order to get consistent and comparable benchmark results, you need to ensure that the software is being built in a consistent way. Altering the build process is one way of optimizing software.
Note that the build time for a complex package can range up to hours or even days!
=== Course Setup ===
Follow the instructions on the **[[SPO600 Communication Tools]]** page to set up a blog, create SSH keys, and send your blog URLs and public key to me.
I will use this information to:
- Update the [[Current SPO600 Participants]] page with your information, and
- Create an account for you on the [[SPO600 Servers]].
This updating is done in batches once or twice a week -- allow some time!
==== Week 1 - Class II ====
=== Video ===
* MS Streams link: [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EXJxBUegk3xAiOeZgxYyjqkBD5ETKnQDi1SnYCnGlfNB4w|Summary Video - January 12, 2024 class]]
=== 6502 Assembly ===
* [[6502]] - Basic information about the processor
* [[6502 Addressing Modes]]
* [[6502 Instructions]]
* [[6502 Emulator]]
* [[6502 Jumps, Branches, and Subroutines]]
=== Lab 1 ===
* [[6502 Assembly Language Lab|Lab 1 - 6502 Assembly Language Lab]]
==== Week 1 Deliverables ====
- Set up your [[SPO600 Communication Tools]]
- Perform [[6502 Assembly Language Lab|Lab 1]] and blog your results
===== Week 2 =====
==== Week 2 - Class I ====
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EXIuHsfefqxKoAtodJfpDegBzPSj27vfx6bx2A_VmifF6Q?e=dvSJLL|Summary Video - January 16, 2024]]
=== Compilers: Standard Optimizations and Feature Flags ===
* [[Compiler Optimizations]] including a brief discussion of feature flags (''-f'') and optimization levels (bundles of feature flag controlled by ''-O'')
==== Week 2 - Class II ====
=== Video ===
* An edited summary video covering [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EfF9e-Fn7MtOuXFDbyB1j3EBr6YCq-4gYzXukuURfn0oig?e=Gh68L7|6502 Math / Jumps, Branches]]
=== 6502 Math and Jumps, Branches, and Procedures ===
* [[6502 Math]]
* [[6502 Jumps, Branches, and Procedures]]
==== Week 2 Deliverables ====
* Your [[SPO600 Communication Tools|Communication Tools]] should be set up by now!
* Complete your [[6502 Assembly Language Lab|Lab 1]]
===== Week 3 =====
==== Week 3 - Class I ====
=== Video ===
* An edited summary video covering [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EcUZ63XhltJIhqtHKLvPN-wBZTVomE7PB_vEm7yzRQJV6g?e=BFnO1X|Compiler Targets and Tuning]]
=== Compilers: Targets and Tuning ===
* Resources
* [[https://gcc.gnu.org/onlinedocs/|GCC Online Documentation]]
* [[https://gitlab.com/x86-psABIs/x86-64-ABI|x86_64 psABI]] specification
* [[https://github.com/ARM-software/abi-aa/releases|ARM ABI]] specifications
==== Week 3 - Class II ====
**There is no synchronous (Zoom) class for January 26.**
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EUpVVC0fyMROkBgUUsBgwjYBZebvaBdfyQX2Q8l1o8ukfg|6502 String Basics]] (26 minutes)
* [[https://web.microsoftstream.com/video/9caa5e8d-0f15-4b8b-9293-0151c82f77b1|6502 String Input]] (72 minutes) Note: references in this video to Lab 3 in a previous semester are relevant to this semester's Lab 2, but the original code requirement has been increased from 25% to 75%.
* [[https://web.microsoftstream.com/video/6a645edd-3537-4910-843c-6d32f6678e79|A 6502 Assembly Language Hack]] (optional) (5 minutes)
=== Lab ===
Now it's your turn to experiment with 6502 assembly language and have some fun. The [[6502 Math and Strings Lab]] (Lab 2) gives you a lot of flexibility to chose an interesting mini-project and execute it.
==== Week 3 Deliverables ====
* [[6502 Math and Strings Lab|Lab 2]]
===== Week 4 =====
==== Week 4 - Class I ====
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EfEwiH8yd31Bo0ONeAVUAswB0fVZsrmkDSgCYNMKYIw2_w?e=l08o7H|GNU iFunc]]
=== Resources ===
* [[https://sourceware.org/glibc/wiki/GNU_IFUNC|GNU iFunc on the GCC Wiki]]
* [[https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Common-Function-Attributes.html#index-ifunc-function-attribute|iFunc attribute - GCC Documentation]]
=== Experimentation ===
* Make sure you can login to both of the [[SPO600 Servers]]
* Build and test the iFunc demo code (https://github.com/ctyler/ifunc-aarch64-demo) on the AArch64 server
==== Week 4 - Class II ====
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EcA2QPnKD6BMm3qinyIPIzoBRW-7Lp2YVKNQujZsP7YK8A|Edited Summary Video - 64 Bit Assembler, Part 1]]
=== Resources ===
* [[Assembly Language]]
* [[ELF]] file format
* [[X86_64 Register and Instruction Quick Start]]
* [[Aarch64 Register and Instruction Quick Start]]
* ARM 64-bit CPU Instruction Set and Software Developer Manuals
* ARM Aarch64 documentation
* [[http://developer.arm.com/|ARM Developer Information Centre]]
* [[https://developer.arm.com/docs/den0024/latest|ARM Cortex-A Series Programmer’s Guide for ARMv8-A]]
* The //short// guide to the ARMv8 instruction set: [[https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf|ARMv8 Instruction Set Overview]] ("ARM ISA Overview")
* The //long// guide to the ARMv8 instruction set: [[https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile|ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile]] ("ARM ARM")
* [[https://developer.arm.com/docs/ihi0055/latest/procedure-call-standard-for-the-arm-64-bit-architecture|Procedure Call Standard for the ARM 64-bit Architecture (AArch64)]]
* x86_64 Documentation
* [[https://developer.amd.com/resources/developer-guides-manuals/|AMD Developer Guide and Manuals]](see the AMD64 Architecture section, particularly the //AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions//)
* [[http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html|Intel Software Developers Manuals]]
* GAS Manual - Using as, The GNU Assembler: https://sourceware.org/binutils/docs/as/
==== Week 4 Deliverables ====
* Complete your [[6502 Math and Strings Lab|Lab 2]]
* Test your ability to access both [[SPO600 Servers]]
* Blog about your test of the [[https://github.com/ctyler/ifunc-aarch64-demo|iFunc demo code]]
* Ensure that your blog is ready for marking by the end of the weekend (February 4, 11:59 pm)
===== Week 5 =====
==== Week 5 - Class I ====
=== Video ===
* Edited summary video: [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EaSTGiyebLZKsj1w7cHPxo8BCIYDb9D-Le5mn88IlalbFg?e=hmNu5n|64-Bit Assembly, Part 2]]
=== Lab 3 ===
* [[64-Bit Assembly Language Lab]] (Lab 3)
* We performed steps 1-3 of the lab
* Steps 4 onward are for you to do
==== Week 5 - Class II ====
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/ERIy8Z7j5iFGjfFDzL67z18BENxF2MlyiM-t_l3O6Bed-w?e=FZkYHK|Summary video - SIMD]]
==== Week 5 Deliverables ====
* Perform [[64-Bit Assembly Language Lab|Lab 3]] and blog your results
===== Week 6 ======
==== Week 6 - Class I ====
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EfogwAY02WROp9g_Y5aHyJQBdvNAG9LOTLZsnlTfdfZtJw?e=XV1be6|Multiple Micro-Architectures and iFunc]] (30 minutes) (Note: this video was narrated via TTS)
* A video from a previous semester for background: [[https://web.microsoftstream.com/video/a6b892e4-b408-4bc7-9fc1-d78e4efb8e0e|SVE and SVE2]] (85 minutes)
=== Code Examples ===
* The code used in the video is available in the directory ''/public/spo600-sve-sve2-ifunc-examples.tgz'' on aarch64-001.spo600.cdot.systems
==== Week 6 Deliverables ====
* Lab 4 will be released on Friday.
===== Week 7 =====
==== Video ====
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EZ-L-rLBVDhAgkvBipT3GPEBTIdvLRwLYJNH_ZYoyGMUdg?e=H6cxWE|Building Large Software Projects]]
==== Building GCC ====
These are the steps required to build GCC:
- Obtain the source by anonymously pulling from the main branch of the git repository: ''git clone git://gcc.gnu.org/git/gcc.git''
- Create an empty //build directory// in which to build the software. This should not be __inside__ the source tree; a good place to put it is //beside// the source tree.
- Change your working directory to the build directory.
- **Perform this step ONLY as your regular, non-root user.** Run the ''configure'' script in the source directory using a relative path (e.g., ''../gcc/configure''). Add a ''--prefix=//dir//'' option to specify where the software will be installed, plus any other configuration options you want to specify. The //dir// should be within your home directory, for example ''$HOME/gcc-test-001/''
- Run ''make'' with the ''-j //n//'' option to specify the maximum number of parallel jobs that should be executed at one time. The value of //n// should typically be in the range of (number of cores + 1) to (2 * number of cores + 1) depending on the performance characteristics of the machine on which you're building.
- Run ''make install'' as a non-root user. Assuming you specified the prefix correctly above, the software should install into subdirectories of the prefix directory, e.g., ''//prefix///bin'', ''//prefix///lib64'', and so forth.
- Add your bin directory to your PATH: ''PATH="//prefix///bin:$PATH"''
There is no need to run any of these steps as the root user, and it is dangerous to run the installation step as the root user, because you could overwrite the system's copy of the software you're installing. Use your regular user account instead.
To build another copy of the same gcc version, perhaps with some code or configuratin changes, you can either repeat the process above with a fresh build directory (start at step 2), or you can run ''make clean'' in your existing build directory and then repeat the process above (start at step 4). Which option you choose will depend on whether you want to keep the previous build for reference.
**Tip:** Each build takes a lot of disk space (12GB or more in the build directory and 2.7GB or more in the installation directory), so check your available disk space periodically (''df -h .''). Delete unneeded builds reguarly. If you're using the class servers and space is getting low, let your professor know and he can adjust the system's storage configuration.
=== Testing Your Build ===
To test your build:
* Having altered your PATH as noted above,verify the version of GCC that you're using by running: ''gcc --version'' -- you should see the version reported as the version you cloned with git (GCC 14.//xx//.//yy//) and the build date (immediately after the version number) should match the date on which you build your copy of gcc.
* Optionally, run the [[https://gcc.gnu.org/install/test.html|compiler testsuite]].
* Verify that the compiler operates correctly by using it to build your code. Ideally, test some features that should be present in the new version that won't be present in the system-installed copy of gcc. Remember that "gcc" stands for "GNU Compiler Collection" and includes not just the gcc C compiler, but also the g++ C++ compiler and compilers for other languages, plus supporting libraries and tools.
===== Week 8 =====
==== Week 8 - Class I ====
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/Eam96HGgsaZAhxwI10LRuCAB3Yq8JRUMKn15vKnid7yCkw?e=bajaHn|Introduction to the Auto-FMV Project]]
=== Project ===
* [[2024 Winter Project]]
==== Week 8 - Class II ====
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/ERxBv2GnQ9dBoGUISJy3PcwBxMdaBaOl7rk02Lp4E6yDtQ?e=B3zlIp|Project Discussion (1)]]
==== Week 8 Deliverables ====
* Blog about your [[2024 Winter Project|Project Stage 1]] work.
===== Week 9 =====
==== Week 9 - Class I ====
=== Video ===
* [[https://seneca-my.sharepoint.com/:v:/g/personal/chris_tyler_senecapolytechnic_ca/EXSi5lYkgfZKsf1qKy61zCwB8_9g2GHAhnYRs_s993DuSQ?e=0pymet|Project Stage 1 Discussion (2)]]
=== Using AArch64 Software on an x86_64 System ===
The qemu-aarch64 instruction emulator will enable the execution of aarch64 code on any Linux system.
If the system is an aarch64 system, then the majority of the code will run natively on the CPU, and qemu-aarch64 will only handle instructions that are not understood by the system. Therefore, if the CPU is an ARMv8 CPU, and the software is ARMv9 software, then the majority of the instructions will run directly on the CPU and the few instructions that exist in ARMv9 that are not present in ARMv8 (such as SVE2 instructions) will be handled much more slowly by the qemu-aarch64 software. You can use this approach to (for example) run ARMv9 software on the [[SPO600 servers|class aarch64 server]].
To use qemu-aarch64 on an aarch64 system, place the ''qemu-aarch64'' executable in front of the name of the executable you wish to run:
qemu-aarch64 testprogram ...
However, if the system is an x86_64 system, then the CPU will not be able to execute //any// of the aarch64 instructions, and //all// of the instructions will be emulated by the qemu-aarch64 software. That means that the code will execute, but at a fraction of the speed at which it would execute on an actual aarch64 system. However, it will run!
To use qemu-aarch64 on an x86_64 system, you will need the qemu-aarch64 software as well as a full set of userspace files (binaries, libraries, and so forth). You can obtain these from the ''/public'' directory on the class [[SPO600 servers|class x86_64 server]]:
$ ll -h /public/aarch64-f38*
-rw-r--r--. 1 chris chris 2.5K Oct 13 10:44 /public/aarch64-f38-root.README
-rw-r--r--. 1 chris chris 934M Oct 13 08:33 /public/aarch64-f38-root.tar.xz
The README file contains installation instructions. The tar.xz file contains the userspace, qemu-aarch64 static binary, and a startup script. Note that the tar.xz file is almost 1 GB in size, and will expand to approximately 3.5 GB when uncompressed.
When the tar.xz file is installed on a Linux system using the instructions in the README file, you will have a full aarch64 Fedora 38 Linux system available. The ''start-aarch64-chroot'' script in the top directory of the unpacked archive will start the qemu environment using a ''chmod'' command. Note that this is //not// a virtual machine -- it's a specific group of processes running under the main system.
The ''/proc'' and ''/sys'' filesystems are not mounted by default in the aarch64 chroot. The best way to mount these is to add these lines to the ''/etc/fstab'' file within the chroot:
proc /proc proc defaults 0 0
sysfs /sys sysfs defaults 0 0
You may want to comment out the lines for ''/boot'' and ''/boot/efi'' at the same time.
Once those changes have been made to the ''/etc/fstab'' file, you can mount the additional filesystems with the command:
mount -a
It may also be useful to set wide-open permissions on the ''/dev/null'' device:
chmod a+rw /dev/null
Note that in the chroot environment starts a root shell. You can create other users with the ''useradd'' command, and switch from root to those users with the command ''su - //username//''
To build GCC in the aarch64 chroot, you will need to install these dependencies (use dnf):
gmp-devel
mpfr-devel
libmpc-devel
libmpc-devel
gcc-g++
=== Using a Raspberry Pi 4 or 5 ===
The Raspberry Pi 4 and 5 utilize aarch64 processors, but are not very fast systems. The Pi 5 is noticably faster than the Pi 4 and is available with more RAM (8 GB).
You can use a Pi4 or a Pi5 to build software. When building code using ''make'', a jobs value of ''-j5'' is probably optimal.
Using Raspberry Pi OS, you will need to install (at least) these dependencies to build GCC (use ''apt install'' to install them):
gcc
make
libmpc-dev
libgmp-dev
libmpfr-dev
Build time for GCC 14 is approximately 168 minutes on a Pi5 with 8GB.
Then run ''configure'' and ''make'' with the usual arguments. Note that SD cards may be slow for storage - consider using an external USB3 SSD or the fastest SD card you can find.
=== Using Make Check on GCC ===
The GCC test suite, distributed with the source code, is based on the DejaGNU framework.
As documented in the notes for the [[https://gcc.gnu.org/install/test.html|compiler testsuite]], you __must__ use the ''-k'' option with ''make check'':
make -k check
However, in order for this to succeed, the DejaGNU software must be installed on your target system. On Fedora, you can do this with ''sudo dnf install dejagnu''. On Debian/Ubuntu/Raspberry Pi OS systems, use ''sudo apt install dejagnu''.
Note that the test suite will take **hours** to execute, even on a fast system!
It produces a number of files ending in ''.sum'' which summarize the test results (it will also producce other log files - see the documentation). It's a good idea to merge the stdout and stderr of the ''make'' command and redirect that to a log file, too, perhaps like this:
$ time make -k check |& tee make-check.log
==== Week 9 - Class II ====
=== Video ===
* Summary Video (pending)
==== Week 9 Deliverables ====
* Project Stage 1
===== Week 10 =====
==== Week 10 - Class I ====
=== Video ===
* Summary Video (pending)
=== Task Selection ===
We selected and assigned tasks during this class. The task assignments are visible on the [[spo600:spo600 2024 winter_participants|Participant Page]] as well as the [[spo600:2024_winter_project|Project Page]].
==== Week 10 - Class II ====
=== Video ===
* Summary Video (pending)
==== Week 10 Deliverables ====
* Blog about your project work
===== Week 11 =====
==== Week 11 - Class I ====
==== Week 11 Deliverables ====
* Blog about your Project work
===== Week 12 =====
==== Week 12 - Class I ====
=== Project Review ===
We reviewed the goals and approaches of the project.
== The Problem ==
There are multiple versions of processors of every architecture currently in the market. You can see this when you go into a computer store such as Canada Computers or Best Buy -- there are laptops and desktops with processors ranging from Atoms and Celerons to Ryzen 4/7/9 and Core i3/i5/i7/i9 processors, and workstations and servers with processors ranging up to Xeon and Epyc/Threadripper devices. Similarly, cellphones range from devices with Cortex-A35 cores through Neoverse X3 cores.
These wide range of devices support a diverse range of processor features.
Software developers (and vendors) are caught between supporting only the latest hardware, which limits the market they can sell to, or else harming the performance of their software by not taking advantage of recent processor improvements. Neither option is attractive for a software company wishing to be competitive.
== The Goal ==
To take good advantage of processor features with minimal effort by the software developers.
== Three Solutions ==
There are three solutions in various stages of preparation, each of which builds upon the previous solutions:
- IFUNC - Indirect Functions - This is a solution provided by the development toolchain (compiler, linker, libraries) but which is largely manual for the software developer. The developer provides multiple alternate versions of performance-critical functions which are targeted at different micro-architectural levels, plus a resolver function that selects between the implementations at runtime. Note that IFUNC is the only solution which enables a resolver function that takes into account factors other than the micro-architectural level of the processor. For example, a resolver function could select beween alternate functions based on available memory, storage performance, or the speed of the network connection.
- FMV - Function Multi-Versioning - This is a solution that is also supported by the development toolchain but which involves slightly less manual work for the developer. There are two levels of FMV:
- FMV with Manual Alternate Functions - The programmer provides the alternate functions and uses function attributes to specify the microarchitectural level at which each is targeted. The resolver function for each group of alternate functions is automatically generated.
- FMV with Cloned Functions - The program provides one version of the function and uses function attributes to specify that clones of that function are to be built, and the micro-architectural targets for each clone. The resolver function for each group of cloned functions is automatically generated. The only difference between the cloned functions is the micro-architectural optimizations that are applied by the compiler. Note that there is nothing to ensure that the clones are actually any better or in fact different from each other.
- AFMV - Automatic Function Multi-Versioning - **This is what we're working on** - This is effectively FMV with Cloned Functions, but the cloning is controlled from the command line rather than using function attributes. This has the advantage that no source changes are required. Every function in the program is cloned, and the after the various optimization passes have been applied, the cloned functions are analyzed. If the functions are different, they are kept, but if they are idential, they are removed, and only the default version of the function is used.
== Specifics: IFUNC ==
GCC IFUNC documenation:
* [[https://gcc.gnu.org/onlinedocs/gcc-5.3.0/gcc/Function-Attributes.html#index-g_t_0040code_007bifunc_007d-function-attribute-3095|GCC Manual]]
* [[https://sourceware.org/glibc/wiki/GNU_IFUNC|GCC Wiki]]
== Specifics: FMV ==
Current documentation:
1. [[https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html|GCC documentation]]
* Mentions that FMV is only implemented for i386 (aka x86_32) - now false as it's also implemented for x86_64, aarch64, and Power
* Does not mention ''target_clones'' syntax
2. [[https://arm-software.github.io/acle/main/acle.html#function-multi-versioning|ARM ACLE documentation]]
* Does not talk about the current state of implementation
* Mentions that FMV may be disabled at compile time by a compiler flag, but this flag is not documented
* The macro ''__HAVE_FEATURE_MULTI_VERSIONING'' (or ''__FEATURE_FUNCTION_MULTI_VERSIONING'' or ''__ARM_FEATURE_FUNCTION_MULTIVERSIONING'') does not appear to be defined
Implementation in GCC
* Implemented and tested in (at least) x86_64, PowerPC4, and AArch64
* I did not test the PowerPC version
* Testing performed with GCC 14.0.1 20240223
* On x86:
* Syntax to manually specify a function target: ''__attribue__((target("nnn")))'' - where ''nnn'' may take the form of "default", or "feature" eg., "sse4.2", or "feature,feature" e.g., "sse4.2,avx2", or it may take the form "arch=archlevel" e.g., "arch=x86-64-v3" or "arch=atom"
* target_version is not accepted as an alternative to target attribute
* Syntax to manually specify cloning: ''__attribute__((target_clone("nnn1", "nnn2" [...])))''
* Works in both the C and C++ frontends
* On AArch64:
* Current support landed Dec 16, 2023
* Syntax to manually specify a function target: ''__attribute__((target_version("nnn")))'' - where ''nnn'' may take the form of "default", or "feature" e.g., "sve", or "feature+feature" e.g., "sve+sve2" (Note: in some earlier versions of GCC, a plus-sign was required at the start of the feature list, e.g., "+sve" instead of "sve". This was changed by gcc 14). Note the use of the attribute target_version as opposed to target (as used on x86) which according to the ACLE . Note that the "arch=nnn" format is not supported.
* Syntax to manually specify cloning: ''__attribute__((target_clone("nnn", "nnn" [...])))'' - note that contrary to some of the documentation, there is no automatic "default" argument - the first argument supplied should be "default"
* Manually specified function target works in the C++ frontend only, but automatic cloning appears to work in both C and C++. Note that most C code can be compiled with the C++ frontend, except for some recent C enhancements not understood by C++ as well as some C++ keywords that are not reserved in C
==== Week 11 Deliverables ====
* Submit your [[spo600:2024_winter_project|Project Stage 2]]