Instruction Encoding
The encoding of instructions, which is a key aspect of an Instruction Set Architecture, defines how instuctions and arguments are encoded as binary values in the machine code of a system.
Some architectures encode instructions as multi-byte sequences, where one or more bytes specify the operation to be performed and the addressing mode(s) to be used, and additional bytes specify the operands (such as the register numbers, immediate values, addresses, or offsets to be used). Additional prefix bytes may provide hints for prefetching or branch prediction or alter the operation of the instruction.
Other architectures encode instructions as fixed-length bitfields, where various (and varying) subfields within the bitfield specify the operation, addressing mode(s), and operands.
The 6502 architecture uses variable-length byte sequences, from 1 to 3 bytes (including the opcode) depending on the addressing mode.
The x86_64 architecture uses variable-length byte sequences, from 1 to approximately 17 bytes. In addition to opcodes and arguments, the byte sequence may contain prefix bytes that alter the operation of the instruction or provide execution hints to the processor.
The AArch64 architecture uses 32-bit long instructions. Since this instruction length is insufficient to adequately contain a 64-bit operand (such as an address), some operands are encoded using run-length encoding (RLE), relative addressing, or shifted-bitfield techniques. For certain operand values, it may be necessary to build the required value in a register with multiple instructions, such as an adrp
followed by an add
. (The armv8 architecture also provides T32 mode (previously known as Thumb mode in armv7), which uses a mixed 32- and 16-bit instruction encoding to provide higher code density).