There are a few different terms swirling around the 64-bit ARM space. This page distinguishes between some of these terms and concepts.
ARM architecture version 9 – known as ARMv9 – was introduced on March 30, 2021. It is an evoltionary advancement of the ARMv8 architecture.
ARM architecture version 8 – known as ARMv8 – was introduced in ~2012 and started to appear in the market in 2013/2014.
ARMv8 has two execution states which support three Instruction Set Architectures:
AArch32 is a 32-bit execution state which supports these instruction sets:
AArch64 is a 64-bit execution state which supports these instruction sets:
There are different profiles for ARMv8 devices, including:
Linux systems may support the execution of AArch32 binaries on an AArch64 platform (multiarch support), or they may prohibit it and allow AArch32 binaries only in a virtual machine.
Debian/Ubuntu supports AArch32 binaries on ARMv8 via a multiarch mechanism similar to that used to support x86_32 binaries on x86_64.
Fedora/Red Hat intentionally does not support AArch32 binaries on ARMv8.
The value of supporting AArch32 binaries on ARMv8 is controversial. The argument for supporting them is for maximum backward-compatibility; the argument against supporting them is that there are very few proprietary/closed-source 32-bit binaries available, they may require recompilation anyways (since AArch32 supports a slight subset of the ARMv7 instruction set), and anything that is a 32-bit open source ARM program can readily be rebuilt for AArch64.
Very few general-purpose 32-bit ARM systems were ever produced - the billions of ARMv6 and ARMv7 devices that exist generally run a dedicated build of an operating system, even if that operating system is open-source. For example, an Android-based cellphone or tablet (which runs Linux) comes with software particularly customized for that device. There is little or no market for general-purpose operating systems that can be installed on a wide range of 32-bit ARM devices, and therefore there was almost no effort made to standardize the boot process.
Although most 32-bit development boards and general devices (such as the Beagle Bone, Wanda Board, Panda Board, CubieBoard, CubieTruck, Radaxa Rock, Utilite, TrimSlice, and so forth) use a version of the U-Boot bootloader, these are almost always customized and operate in a way that is unique to the device. For example, some U-Boot versions boot only from some combination of NAND/NOR SPI-connected flash memory, eMMC, SD card, or disk; some load the kernel using a configuration stored on the boot device, while others store the boot configuration in the device that holds the U-Boot bootloader (which may be different); some load the U-Boot software itself directly from a particular block offset or FAT slot number, while others load it by name, or load it from SPI-connected flash; and so forth.
Dennis Gilmore of Red Hat and some others attempted to unify the U-Boot situation; however, this has been an uphill battle, as new 32-bit ARM devices have continued to flood onto the market.
In addition to the boot environment, the machine description (describing the devices which make up the system in addition to the CPU) was originally done using a “machine number” passed in from the boot environment. This led to the creation of incompatible patch sets for the kernel, such that the kernel could not be built so that it would work on a variety of devices - it had to be built for a specific machine.
Arnd Bergman (originally working at IBM, now with Linaro), one of the key ARM kernel maintainers, worked with others to move from machine numbers to using Device Tree to describe the attached hardware. This paved the way to move to a “Single zImage” - a kernel which could run on a variety of different devices by using data in a board-specific Device Tree Blob (DTB) to initialize the correct device drivers with the correct parameters for each device. This in turn has made it much easier for various distros (Fedora, Debian/Ubuntu, Mint, Gentoo, etc) to support a range of devices.
The situation is different in the server space - companies want to be able to buy servers from any vendor and install a standard operating system. Jon Masters of Red Hat and others have led efforts to standardize the boot process and environment for ARMv8 servers, using UEFI for the boot process and ACPI for machine description. The move from Device Tree to ACPI has caused some grumbling from vendors, but it's a relatively straightforward evolutionary step, and much simpler than jumping from the machine number approach directly to ACPI.
This in turn has led to the development of the ARM Server Base System Architecture (SBSA) specification, which details the minimum hardware requirements for a standard ARMv8 server, and the Server Base Boot Requirements (SBBR) specification, which details how the boot firmware should work. Any system following these specifications should be able to boot a standard ARMv8 operating system from any vendor. Since this is a clean design in which we learned from previous industry mistakes, there is high hope that the boot situation on ARMv8 will be even better standardized than on x86_64.
Since EFI and ACPI were previously very x86-specific and tied to particular Windows releases, adopting these for ARM systems and non-Windows operating systems has led to changes in the management and governance of these standards.
It remains to be seen what situation will develop on non-server ARMv8A devices, which generally fall into two categories:
Later version of uBoot support UEFI, and the Embedded Base Boot Requirements (EBBR) standard codifies the requirements for an embedded system based on UEFI.
Development boards will probably initially ship with u-Boot based systems with or without UEFI, but it is hoped that SBSA and/or EBBR lead to unification of these approaches. The 96boards project is having a significant impact in bringing under-$100 ARMv8 Consumer Edition (mobile processor-based) and under-$500 Enterprise Edition (enterprise server processor-based) development boards to market; while the Consumer Edition (CE) boards ship with u-boot and/or uefi and/or the Android bootloader, the Enterprise Edition (EE) boards are expected to conform to SBSA.
ARM does not implement its processors; except for a few test kits (e.g., the “Juno” ARMv8 systems), ARM licenses their intellectual property to other companies for implementation. This has resulted in dozens of different companies producing ARM processors.
ARM licenses their technology at several different levels:
Most ARMv8 implementations are a System-On-a-Chip (SoC), which means that the processor cores, memory controller, interrupt controller, input/output (IO) bus adapters, and graphics system/GPU are all on a single chip. In some mobile chips, radio (WiFi/LTE/Bluetooth/GPS) systems may also be integrated into the SoC. This means that only the PHY (physical level circuits), RAM, flash, and power controller need to be added to create a fully-functioning system.
Most SoCs offer more features than are used in any one system and more features than can be exposed on the pins which are physically present on the chip. A pin multiplexor system, or PinMux, is used to select which signals are currently exposed on the SoC's pins. For example, a given group of pins could be used for an SPI serial interface, or an I2C serial interface, or as general-purpose input/output (GPIO) connections, but are only connected to one of those functions at a time.
In addition, a number of SoCs use high-speed serial interfaces for multiple purposes – a pool of 40 multi-gigabit-per-second serial interfaces might be provided, for example, and it is up to the board designer to decide how many of those interfaces to use as the lanes of a PCIe bus, gigabit (or faster) ethernet ports, or as SATA ports.
The operating system kernel has the required mechanisms to set up the PinMux as needed for a given board, and to connect serial controllers to the appropriate drivers. In order to do this, it is critical that the kernel receive not only an accurate description of the SoC, but also an accurate description of how that SoC is wired up in the current system. This information is passed in via a Device Tree or an ACPI description (ACPI is mandated by the SBSA specification for servers).
The ARM space is littered with very confusing (and conflicting) numbering schemes.
ARM cores may be combined in compatible groups of higher-performance/high-consumption and lower-performance/lower-consumption devices. These configurations were enabled by a technology which Arm called big.LITTLE, and a related but newer cluster technology which Arm calls DynamIQ.
The advantage to big.LITTLE/DynamIQ lies in the ability to turn off cores that are not needed. Thus, when a device such as a cellphone is performing background tasks (screen off), one little core may be used; when the device is performing basic tasks, a couple of little cores or one big core may be used; and when very demanding tasks are performed, several big cores (or all of the cores) may be turned on.
Balancing power vs. performance can be very difficult - for example, will it require less energy to keep a little core on constantly to perform a background task, or run a big core for a fraction of a second every few seconds and sleep all of the cores the rest of the time? Issues such as core affinity and cache coherency also play into balancing decisions.
Wikipedia has a page on big.LITTLE that includes a list of known implementations.