Learning assembler on Linux
Summary
For entertainment, I’m learning assembler on Linux. Jotting down some things I learn here.
There are two syntaxes, AT&T and Intel (Go uses it’s own, because Plan 9). They look very different, but once you get over that the differences are minimal. Linux tradition is mostly AT&T syntax, MS Windows mostly Intel.
There’s no standardisation, so each assembler can do things it’s own way. as
, the GNU Assembler is the most common one on Linux (and what gcc
emits by default), but nasm
, the Net wide Assembler is very popular too. Code written for as
will not assemble in nasm
.
Talking to the Linux kernel is different depending whether you have a 32-bit (x86) or 64-bit (x86-64) processor:
- The registers to use change
- The instruction to call changes (
int 80h
vssyscall
) - The syscall numbers change
So before you even get started, you need to pick a syntax, an assembler, and a target. I’m using as
, with AT&T syntax, on Linux x86-64.
To learn I’m reading Assembly Language Step-by-Step. It’s definitely helpful, but it’s targeted at a CS 101 class which makes it slow going. It also uses Intel syntax, with nasm, on 32-bit, which takes a bit of mental translating.
Here is the first program from that book, translated, in case you want to play too:
.data
eatmsg:
.ascii "Eat at Joe's!\n"
eatlen = . - eatmsg
.text
.global _start
_start:
mov $1, %eax # 'write' syscall
mov $1, %edi # write to stdout (fd 1)
mov $eatmsg, %rsi # address of string to write
mov $eatlen, %edx # length of string to write
syscall
mov $60, %eax # 'exit' syscall
mov $0, %edi # return code 0
syscall
Save as eatsyscall.s
and build with:
as -gstabs -o eatsyscall.o eatsyscall.s
ld -o eatsyscall eatsyscall.o
Other bookmarks I keep open:
- GNU Assembler manual. Extremely terse, but it’s there.
- Kernel calling convention. Because I forget which registers to use (RDI, RSI, RDX, R10, R8, and R9 – yes 10 8 9 at the end that’s not a typo).
- AMD manuals, particularly Part 3 – General Purpose Instructions.
/usr/include/x86_64-linux-gnu/asm/unistd_64.h
for the syscall numbers, andman 2 <syscall name>
for what to pass them.- Programming from the ground up. This looks promising, and uses the same syntax and assembler as me. I haven’t gotten to reading it yet.
I’ve already learnt two interesting things, about starting and stopping programs.
Programs don’t start at main
, they start at _start
. When you build a C program, _start
is put in for you, does some setup, then calls main. _start
is the symbol the linker ld
looks up to know what address to put in the ELF header as the entry point address. For a different example, the Go start symbol (on x86-64 linux) is _rt0_amd64_linux
.
Programs have to explicitly exit. If you don’t call the exit (or exit_group) system call, your program keeps on running, tries to get it’s next instruction from whatever comes right after it in memory, and crashes.
You can call all of the C stdlib functions from assembler, by using gcc
to link, or passing the right arguments to ld
. Or you can not be so lazy, and do everything yourself!
That’s the part I’m most excited about. How am I going to allocate memory, without malloc? No, don’t answer that. The fun is in figuring it out.