The Underrust: Rust's assembly output

August 31, 2022 software rust underrust assembly

What secrets await the brave, what horrors await the foolish, only the imagination can reveal - until the stillness is disturbed. This is the Underdark.

Updated 2022-10-18

If you want to know what a program actually does, you look at the assembly it executes. And I want to know what Rust actually does.

Underrust will be a series looking at Rust’s assembly output. I will attempt to answer questions like:

Does it really matter whether I use a u8 or a u32? What about an i32?
What is the performance cost of sync::Once?
How does Rust return multiple values?
What does vec![0u8; 1024] really do?
How do u128 and i128 work? There are no general purpose 128-bit registers.
Does Rust optimize the field order of a struct?

Some of this is documented, but this series will look at what Rust actually does. If sync::atomic memory ordering is a lie, which it is, what else is?

Those who do escape to the safety of their surface homes return changed. Their eyes have seen the shadows and the gloom, the inevitable doom of the Underdark.

How? The template

I will be using Rust 1.64+ on Linux (Fedora 36+) and a 64-bit Intel Tiger Lake processor (i7-1185G7).

All of the Underrust series will use this same template.

If you try looking at the assembly output of a regular Rust program you get an awful lot of noise because of all the things Rust does for you. Add in the truly remarkable optimizations that LLVM gives us, and mapping your Rust code to specific lines of assembly gets challenging. Instead, we’ll use a very stripped down Rust program taken from A very small Rust binary indeed.

Cargo.toml

[package]
name = "underrust"
version = "0.1.0"
edition = "2021"

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

The important part is panic = "abort". Without that you will get obscure warnings about eh_personality (meaning “exception handling style”).

src/main.rs

#![no_std]
#![no_main]

use core::arch::asm;

#[no_mangle]
pub extern "C" fn _start() -> ! {

	// Our code will go here
	let mut ret = 2;
	ret *= 4;

	// Programs have to exit. Rust's stdlib usually does that for us,
	// but stdlib is for surface dwellers.
    unsafe {
        asm!(
            "mov rdi, rsi",
            "mov eax, 60",
            "syscall",
            in("rsi") ret,
            options(nostack, noreturn),
        )
        // nostack prevents `asm!` from push/pop rax
        // noreturn prevents it putting a 'ret' at the end
    }
}

#[panic_handler]
fn my_panic(_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

If you’re not sure why main is called _start or what that panic_handler is all about, those are covered in A very small Rust binary indeed.

Build

RUSTFLAGS="-Ctarget-cpu=native -Clink-args=-nostartfiles" cargo build --release

Target your specific CPU so that we can see LLVM at it’s best, and tell the linker we brought our own entry point (_start) so we don’t need it to call main for us. We also nearly always want --release because that what we’re going to run in production, and it elides a lot of noisy debug bounds checks.

Inspect

objdump -Mintel -d release/underrust | rustfilt

This should output the assembly. rustfilt (cargo install rustfilt) is optional. It de-mangles symbol names.

Tada! Our first assembly output

0000000000001000 <_start>:
    1000:       be 08 00 00 00          mov    esi,0x8
    1005:       48 89 f7                mov    rdi,rsi
    1008:       b8 3c 00 00 00          mov    eax,0x3c
    100d:       0f 05                   syscall
    100f:       0f 0b                   ud2

The first interesting thing is that the compiler did the maths upfront for us, replacing ret = 2; ret *= 4 with 8.

The final ud2 is an invalid instruction which I believe LLVM adds whenever undefined behavor would happen. Our function claims not to return (that’s what -> ! means). The ud2 instruction will raise an exception if we don’t stick to the contract.

Template variation: Standard library

That stripped down Rust file makes for very clear assembly output, but it doesn’t include the standard library. You can’t see what Box or Vec look like, for example. That’s easy enough to change.

Remove #![no_std] from the top
Remove the panic handler my_panic function

Build instructions don’t change.

The objdump output will be a lot bigger, so we request disassembly of only the _start symbol (function):

objdump -Mintel --disassemble=_start release/underrust | rustfilt

A trick

Earlier we tried to do an addition and a multiplication, but Rust pre-calculated the final result at compile time. What if we don’t want that?

We need to find a value that the compiler doesn’t know. We will use the trick from A random number you already have and use the address of a local variable on the stack.

let mut ret = 2;
ret += &ret as *const _ as u32;
ret *= 4;

and now we get something closer to our source code, here heavily annotated

; make space for a 4 byte (u32) local variable (ret) on the stack
; this is unnecessary
sub    rsp,0x4

; move the address of `ret` (the stack pointer) into a register
mov    rax,rsp

; the maths, still being clever
; rax*4 is &ret * 4
; 2*4 is still pre-computed to 8
lea    esi,[rax*4+0x8]

; store `ret` in it's reserved space on the stack
; this is unnecessary
mov    DWORD PTR [rsp],esi

; our `asm!`
mov    rdi,rsi
mov    eax,0x3c
syscall
ud2

This is quite typical. The compiler is sometimes amazing, sometimes wasteful, and always surprising. Here the Rust equivalent is something like this:

let mut ret: u32; // uninitialized
let ret_addr = &ret as *const _ as u32;
ret = ret_addr * 4 + 8;
return ret;

A different way to (partially) defeat compiler optimizations is to pass your value to black_box. As far as I can tell this uses the same trick of taking the stack address, but it does it twice.

Assembly primer

If you haven’t read much assembler recently there are much better tutorials, but here is the bare minimum to get you going.

Basic stuff

mov esi, 0x8

This translates to esi = 8.

The first part (mov) is always the instruction. You can look them up in Felix Cloutier’s fantastic reference.

After that are a variable number of arguments. Here, in Intel syntax (versus the much-less-legible AT&T syntax) the destination is on the left and is register esi. x86 has eight general purpose registers which have different names for different parts (pictures).

The last part is the source, here the immediate value 8 expressed in hexadecimal.

Arithmetic

add eax, ebx

Arithmetic instructions are always the OpAssign version. This says eax += ebx.

Addresses

mov    BYTE PTR [rsp+0x1a],0x1

Local variables are, as you know, stored on the stack. rsp is the stack pointer. Square brackets indicate a value used as an address, so this adds 26 (0x1a) to the address currently stored in the stack pointer, and writes the one byte sized value 1 there (destination is on the left). This corresponds to let x: u8 = 1;.

Load Effective Address is weird but important

lea    eax,[rsi+rdi*1]

Assembly has a very fast lea (Load Effective Address) instruction designed to calculate an address. It can do wonderful things like calculating the address of the second field of the third element in an array, in a single CPU cycle, and because of this compilers use it for regular maths all the time.

This is the one big exception to the rule that square brackets indicate an address. Here they do not, it’s just doing maths.

This says eax = rsi + rdi. The first two parameters to a function are passed in rdi and rsi, so this is probably the output for the second line here:

fn f(a: u32, b: u32) { // a is in rdi, b is in rsi
	let x = a + b;     // x is in eax
	...

That’s it. In future posts we will use this template to explore the Underrust, the secret world beneath the bustling surface of Rust. Grab your scimitars. Let’s go!

Graham King

How? The template

Template variation: Standard library

A trick

Assembly primer