Table of Contents

SPO600 - 2025 Winter

This is the SPO600 tentative course schedule. It's a live document and will be revised throughout the semester. Each topic will be linked to notes at the end of this page as the course proceeds.

Classes marked Async will be delivered in asynchronous online mode.

WeekWeek of …Class I (Tuesday)Class II (Friday)Deliverables
Week 1January 6Introduction to Software Portability and Optimization; Course Setup (including SSH Keys)Introduction to the 6502 ProcessorSet up Communication Tools, Lab 1
Week 2January 13Binary Representation of Data, Introduction to Computer Architecture AsyncCompiler Flags and Microarchitectures AsyncFinish Lab 1
Week 3January 206502 Math & Flow Control6502 Math LabLab 2
Week 4January 276502 StringsBuilding GCC; Make & MakefilesLab 3, Lab 4, and Blog Posts Group 1
Week 5February 3Compiler Optimizations & Compiler InternalsIntroduction to 64-Bit SystemsComplete labs 3 & 4
Week 6February 1064 Bit Assembler LabProject Stage 1Lab 5, Project blogging
Week 7February 17Project Stage 1Indirect Functions (IFUNC), Function Multi Versioning (FMV), Automatic Function Multi Versioning (AFMV)Project bogging
Reading WeekFebruary 24Study Week
Week 8March 3Project Discussion; Single Instructioni Multiple Data (SIMD)Single Instruction Multiple Data (SIMD) and Scalable Vector Extensions (SVE and SVE2)
Part Async
Project Stage 1, Blog posts group 2
Week 9March 10Project Stage IIProject Stage I resultsProject blogging
Week 10March 17Project DiscussionBenchmarking & Profiling
Part Async
Project blogging
Week 11March 24Paged Memory SystemsAdvanced Memory Concepts
Part Async
Project blogging
Week 12March 31Project DiscussionAdvanced Memory ConceptsProject Stage 2, Blog posts group 3
Week 13April 7Project DiscussionProject RecommendationsProject blogging
Week 14April 14Course Wrap-UpNo classProject Stage 3, Blog posts group 4

Week 1

Week 1 - Class I

Video

Note: these summary videos are no substitute for attending class in-person! They do not include: quizzes and quiz answer discussion, group exercises, and group discussion. It may take several days to process and edit the video before it is made available. It may not record properly and may not be made available. Do not rely only on the summary videos!

General Course Information

About SPO600 Classes

Introduction to the Problems

Porting and Portability

Most software is written in a high-level language which can be compiled into machine code for a specific computer architecture. In many cases, this code can be compiled or interpreted for execution on multiple computer architectures - this is called 'portable' code.

However, there is a lot of existing code that contains some architecture-specific code fragments which contains assumptions about the architecture, resulting in architecture-specific high-level or Assembly Language code. This code must be 'ported' or adapted to work on other platforms.

Reasons that code is architecture-specific:

Reasons for writing code in machine-specific Assembly Language include:

Most of the historical reasons for using assembler are no longer valid. Modern compilers can out-perform most hand-optimized assembly code, atomic operations can be handled by libraries or compiler intrinsics, and most hardware access should be performed through the operating system or appropriate libraries.

A new architecture has recently appeared: AArch64, which is a 64-bit execution state introduced as part of Arm architecture version 8 (ARMv8). This is the first new computer architecture to appear in several years (at least, the first mainstream computer architecture).

At this point, most key open source software (the software typically present in a Linux distribution such as Ubuntu or Fedora, for example) now runs on AArch64. However, it may not yet be as extensively optimized as on older architectures (such as x86_64).

An additional architecture is on the horizon: Risc-V (pronounced “Risk-Five”). This is an experimental architecture which is open-source and may provide a competitive alternative to ARM and other architectures in the future.

Optimization

Optimization is the process of evaluating different ways that software can be written or built and selecting the option(s) that has the best performance tradeoffs for the situation at hand.

Optimization may involve substituting software algorithms, altering the sequence of operations, using architecture-specific code, selecting data types, or altering the build process. It is important to ensure that the optimized software produces correct results and does not cause an unacceptable performance regression for other use-cases, system configurations, operating systems, or architectures.

Optimization is tied to the concept of Portability because optimization techniques vary according to the details of the target architecture.

The definition of “performance” varies according to the target system and the operating goals. For example, in some contexts, low memory or storage usage is important; in other cases, fast operation; and in other cases, low CPU utilization or long battery life may be the most important factor. It is often necessary to trade off performance in one area for another; using a lookup table, for example, can reduce CPU utilization and improve battery life in some algorithms, in return for increased memory consumption.

Virtually all compilers (and interpreters) perform some level of optimization, and the options selected for compilation can have a significant effect on the trade-offs made by the compiler, affecting memory usage, execution speed, executable size, power consumption, and debuggability.

However, there are some types of optimization that cannot be applied by the compiler, and which must be applied by the programmer.

Build Process

Building software is a complex task that many developers gloss over. The simple act of compiling a program invokes a process with five or more stages, including pre-processing, compiling, optimizing, assembling, and linking. However, a complex software system will have hundreds or even thousands of source files, as well as dozens or hundreds of build configuration options, auto configuration scripts (cmake, autotools), build scripts (such as Makefiles) to coordinate the process, test suites, and more.

The build process varies significantly between software packages. Most software distribution projects (including Linux distributions such as Ubuntu and Fedora) use a packaging system that further wraps the build process in a standardized script format, so that different software packages can be built using a consistent process.

In order to get consistent and comparable benchmark results, you need to ensure that the software is being built in a consistent way. Altering the build process is one way of optimizing software.

Note that the build time for a complex package can range up to hours or even days!

Benchmarking and Profiling

Benchmarking involves testing software performance under controlled conditions so that the performance can be compared to other software, the same software operating on other types of computers, or so that the impact of a change to the software can be gauged.

Profiling is the process of analyzing software performance on finer scale, determining resource usage per program part (typically per function/method). This can identify software bottlenecks and potential targets for optimization. The resource utilization studies may include memory, CPU cycles/time, or power.

Communication Tools Setup

Follow the instructions on the SPO600 Communication Tools page to set up a blog, create SSH keys, and send your blog URLs and public key to me.

I will use this information to:

  1. Update the Current SPO600 Participants page with your information, and
  2. Create an account for you on the SPO600 Servers, if you didn't do that during class.

The updating is done in batches every few days – allow some time!

Week 1 - Class II

Video

6502 Assembly Language Programming

Lab 1

Week 1 Deliverables

Week 2

Week 2 - Class I

This is an asynchronous class - there is no live meeting.

Video

Pre-recorded Video:

Week 2 - Class II

This is an asynchronous class - there is no live meeting.

Video

Pre-recorded video:

Week 2 Deliverables

Week 3

Week 3 - Class I

Video

Resources

Week 3 - Class II

Lab 2

Week 3 Deliverables

Week 4

Week 4 - Class I

Video

Lab 3

Week 4 - Class II

Video

Building GCC

Lab

Week 4 Deliverables

Week 5

Week 5 - Class I

Video

Resources

Week 5 - Class II

Video

Resources

Week 5 Deliverables

Week 6

Week 6 - Class I

Video

64-bit Class Servers

64-bit Assembly Language

Lab 5

Week 6 - Class II

Video

Week 6 Deliverables

Week 7

Week 7 - Class I

Video

Project Stage 1

Week 7 - Class II

Video

Runtime Codepath Selection

1. IFUNC - The ifunc capability allows a program to provide multiple implementations of a function, and to use a “resolver function” which is run once at program initialization to determine which implementation will be used. The resolver function returns a pointer to the selected function, which is used from that point on as for the life of the process. This capability is very flexible but requires the programmer to create:

2. FMV - The GCC compiler includes a function multiversioning capability for x86_32, x86_64, powerpc, and aarch64 architectures (with slightly different implementations). FMV is similar to ifunc, and can be used in two different ways:

This requires fewer code changes than ifunc, but still requires that the programmer state the architectural variants that will be targetted. The programmer also needs to know (or guess!) which functions would benefit from multiversioning.

3. AFMV - This “automatic function multi-versioning” capability does not exist yet, and is what we're working towards building. AFMV should work like FMV function cloning, but without any source code changes; instead, a compiler option will be used to specify the architectural variants of interest, and any function that would benefit from function multi-versioning will be automatically cloned. It is proposed that AFMV operate in this fashion:

Resources

Specifics: IFUNC

GCC IFUNC documenation:

Specifics: FMV

Current documentation:

1. GCC documentation

2. ARM ACLE documentation

Implementation in GCC

Week 7 Deliverables

Week 8

Week 8 - Class I

Video

There are some technical issues with camera focus on the video for this week. My apologies for the low quality!

SIMD Examples

The sound volume scaling examples mentioned in the video may be found in the file /public/spo600-volume-examples.tgz on either of the SPO600 Servers.

Week 8 - Class II

Video

SVE/SVE2 Examples

For some SVE/SVE2 example code, see /public/spo600-sve-sve2-ifunc-examples.tgz on aarch64-001.spo600.cdot.systems. This archive contains:

Week 8 Deliverables

Week 9

Week 9 - Class I

Video

Week 9 - Class II

Project Stage II

Week 9 Deliverables

Week 10

Week 10 - Class I

Video

Code Sample

Here is some of the code that was discussed in the lecture; this is the execute() method from a pass:

unsigned int
pass_ctyler::execute (function *fun)
  {
 
    basic_block bb;
 
    if (dump_file) {
 
 
      fprintf(dump_file, "--------------------------------------------------------------------\n");
      fprintf(dump_file, "Function: %s\n", IDENTIFIER_POINTER(DECL_NAME(fun->decl)) );
      fprintf(dump_file, "--------------------------------------------------------------------\n");
 
      FOR_EACH_BB_FN(bb, fun) {
          for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) {
              gimple *stmt = gsi_stmt(gsi);
 
              print_gimple_stmt(dump_file, stmt, 0, TDF_NONE);
 
              fprintf(dump_file, "\n  Gimple code:       %d\n", gimple_code(stmt) );
              fprintf(dump_file, "  Gimple code name:  %s\n", gimple_code_name[gimple_code(stmt)] );
 
              fprintf(dump_file, "--------------------------------------------------------------------\n");
 
          }
 
          fprintf (dump_file, "\n\n#### End ctyler diagnostics, start regular dump of current gimple ####\n\n\n");
      }
 
    }
 
    return 0;
 
  }

Week 10 - Class II

Video

Week 10 Deliverables

Week 11

Week 11 - Class I

Video

Week 11 - Class II

Video

Week 11 Deliverables