When you program in assembly language, you're directly programming the “bare metal” hardware. This means that many of the compile-time and run-time checks, error messages, and diagnostics that are present in other languages are not available. The computer will follow your instructions exactly, even if they are completely wrong (like executing data), and when something goes wrong, your program won't terminate until it tries to do something that's not permitted, such as execute an invalid opcode or attempt to access a protected or unmapped region of memory. When that happens, the CPU will signal an exception, and in most cases the operating system will shut down the offending process.
The traditional extension for assembly-language source files is .s
(e.g., example.s
), or .S
for files that need to go through the C preprocessor (cpp
).
An assembly-language program consists of:
Assembler directives are used to control the assembly of the code, by specifying output file sections (such as .text (machine code), .data (read/write data), or .rodata (read-only data/constants) in an ELF file) and data formats (such as word size for numeric values), and by defining macros.
Consider this x86_64 assembly language “Hello World” program:
.text .global _start stdout = 1 _start: mov $len,%rdx /* message length */ mov $msg,%rsi /* message location */ mov $stdout,%rdi /* file descriptor stdout */ mov $1,%rax /* syscall sys_write */ syscall mov $0,%rdi /* exit status */ mov $60,%rax /* syscall sys_exit */ syscall .rodata msg: .ascii "Hello, world!\n" .set len , . - msg
This program was written using GNU Assembler (gas) syntax.
In GNU assembler, a symbol may be set in one of three ways:
.set
directive (in the example above, len line).set
directiveIn the program above:
a=1
as equivalent to .set a, 1
– both are counted as directives regardless of the presence of the .set
keyword.Note that symbols are not variables - they are constants that are calculated at compile-time. However, they may contain an address at which a variable is stored.
Note also that the syntax will vary from assembler to assembler and from architecture to architecture.
The 6502 Emulator provides a very simple assembler for 6502 code:
define name value
like this:define SCREEN $f000
dcb 12,34,56 dcb $33,$66,$CC,$FF dcb "A" dcb "T","e","s","t",$20,"4",".",0
*=$0800
To create an assembly language program in the emulator, place the source code into the text area by typing, pasting, or using the Load button, and then hit the Assemble button. If no error messages are output, the Run button will be enabled, and pressing that will run your code. To save your code, copy and paste it from the text area into an editor.
On a Linux system, you will need to meet three requirements to get your assembly language program to work:
.text
section of the ELF file..rodata
(read-only data) or .data
(read/write data) sections of the ELF file.ld
) will use to find the entry point to your program. If the code is being directly compiled by the assembler, this symbol must be _start
– but if the code is being compiled by gcc, this symbol must be called main
(a preamble will be located at _start
which will then transfer control to main
).
The file extension should be .s
for assembler source without directives (for compilation with the assembler) or .S
for assembler with preprocessor directives (for compilation with gcc).
as -g test.s -o test.o
ld test.o -o test
Note that the -g
option assembles the program with symbolic debugging information included.
nasm -g -f elf64 -o test.o test.s
ld -o test test.o
Notes:
-f elf64
option instructs NASM to assemble the code for a 64-bit target (x86_64).-g
option assembles the program with symbolic debugging information included.gcc -g -o test.o test.S
Note that the -g
option assembles the program with symbolic debugging information included, and that the normal GCC optimization options are not available.
To get started with specific instruction set architectures, see: