Graham King

Solvitas perambulum

How memory is allocated

software unix
Summary
Last year, while learning assembler, I explored how to allocate memory without using `malloc`, which is not a system call but a library function. In Linux amd64, each process has a 128 Tb virtual address space, with the heap located between the program code at the bottom and the stack at the top. To allocate memory, I called the `brk` system call to adjust the program break. First, I retrieved the current break position with `brk` by passing it a zero value. Then, to allocate four bytes, I updated the break by passing the new position to another `brk` call. This allows me to store data at the newly allocated address. To free memory, I simply move the break back down.

tl;dr man 2 brk

Last year when I was learning assembler, I was asking myself how to allocate memory without malloc. Usually memory is either allocated for us by our language, or we do it with new or malloc. But malloc is a library function, it’s not a system call. How does malloc itself get memory from the kernel? To answer that we need to look at the layout of a program in memory.

On Linux amd64, every process gets it’s own 128 Tb virtual address space. The program code, global data, debugging information and so on are loaded at the bottom of that space, working ‘upwards’ (bigger numeric addresses). Then comes the heap, where we are going to allocate some memory. Where the heap ends is called the program break. Then there is a very large gap, which the heap will grow into. At the top of the address space (0x7fffffffffff) is the stack, which will grow downwards, back towards the top of the heap. Here is a graphic of virtual memory layout

To allocate memory on the heap, we simply ask the kernel to move the program break up. The space between old program break and new program break is our memory. The system call is brk. First we have to find out where it is now. brk returns the current position, so we simply have to call it. We pass it 0, which is an invalid value, so that it doesn’t change anything.

    mov $12, %rax   # brk syscall number
    mov $0, %rdi    # 0 is invalid, want to get current position
    syscall

When that returns, the current position is in rax. Let’s allocate 4 bytes, by asking the kernel to move our break up by four bytes:

    mov %rax, %rsi  # save current break

    mov %rax, %rdi  # move top of heap to here ...
    add $4, %rdi    # .. plus 4 bytes we allocate
    mov $12, %rax   # brk, again
    syscall

We can now store anything we want at the address pointed at by rsi, where we saved the start of our allocated space. Here is a full assembly program which puts “HI\n” into that space, and prints it out. alloc.s. Compile, link, run:

as -o alloc.o alloc.s
ld -o alloc alloc.o
./alloc

To free memory, you do the opposite, you move the break back down. That allows the kernel to re-use that space. Happy allocating!