The Underrust: Rust's assembly output
Summary
What secrets await the brave, what horrors await the foolish, only the imagination can reveal - until the stillness is disturbed. This is the Underdark.
Updated 2022-10-18
If you want to know what a program actually does, you look at the assembly it executes. And I want to know what Rust actually does.
Underrust will be a series looking at Rust’s assembly output. I will attempt to answer questions like:
- Does it really matter whether I use a u8 or a u32? What about an i32?
- What is the performance cost of sync::Once?
- How does Rust return multiple values?
- What does vec![0u8; 1024] really do?
- How do
u128
andi128
work? There are no general purpose 128-bit registers. - Does Rust optimize the field order of a
struct
?
Some of this is documented, but this series will look at what Rust actually does. If sync::atomic memory ordering is a lie, which it is, what else is?
Those who do escape to the safety of their surface homes return changed. Their eyes have seen the shadows and the gloom, the inevitable doom of the Underdark.
How? The template
I will be using Rust 1.64+ on Linux (Fedora 36+) and a 64-bit Intel Tiger Lake processor (i7-1185G7).
All of the Underrust series will use this same template.
If you try looking at the assembly output of a regular Rust program you get an awful lot of noise because of all the things Rust does for you. Add in the truly remarkable optimizations that LLVM gives us, and mapping your Rust code to specific lines of assembly gets challenging. Instead, we’ll use a very stripped down Rust program taken from A very small Rust binary indeed.
Cargo.toml
[package]
name = "underrust"
version = "0.1.0"
edition = "2021"
[profile.dev]
panic = "abort"
[profile.release]
panic = "abort"
The important part is panic = "abort"
. Without that you will get obscure warnings about eh_personality
(meaning “exception handling style”).
src/main.rs
#![no_std]
#![no_main]
use core::arch::asm;
#[no_mangle]
pub extern "C" fn _start() -> ! {
// Our code will go here
let mut ret = 2;
ret *= 4;
// Programs have to exit. Rust's stdlib usually does that for us,
// but stdlib is for surface dwellers.
unsafe {
asm!(
"mov rdi, rsi",
"mov eax, 60",
"syscall",
in("rsi") ret,
options(nostack, noreturn),
)
// nostack prevents `asm!` from push/pop rax
// noreturn prevents it putting a 'ret' at the end
}
}
#[panic_handler]
fn my_panic(_info: &core::panic::PanicInfo) -> ! {
loop {}
}
If you’re not sure why main is called _start
or what that panic_handler
is all about, those are covered in A very small Rust binary indeed.
Build
RUSTFLAGS="-Ctarget-cpu=native -Clink-args=-nostartfiles" cargo build --release
Target your specific CPU so that we can see LLVM at it’s best, and tell the linker we brought our own entry point (_start
) so we don’t need it to call main
for us. We also nearly always want --release
because that what we’re going to run in production, and it elides a lot of noisy debug bounds checks.
Inspect
objdump -Mintel -d release/underrust | rustfilt
This should output the assembly. rustfilt (cargo install rustfilt
) is optional. It de-mangles symbol names.
Tada! Our first assembly output
0000000000001000 <_start>:
1000: be 08 00 00 00 mov esi,0x8
1005: 48 89 f7 mov rdi,rsi
1008: b8 3c 00 00 00 mov eax,0x3c
100d: 0f 05 syscall
100f: 0f 0b ud2
The first interesting thing is that the compiler did the maths upfront for us, replacing ret = 2; ret *= 4
with 8
.
The final ud2
is an invalid instruction which I believe LLVM adds whenever undefined behavor would happen. Our function claims not to return (that’s what -> !
means). The ud2
instruction will raise an exception if we don’t stick to the contract.
Template variation: Standard library
That stripped down Rust file makes for very clear assembly output, but it doesn’t include the standard library. You can’t see what Box
or Vec
look like, for example. That’s easy enough to change.
- Remove
#![no_std]
from the top - Remove the panic handler
my_panic
function
Build instructions don’t change.
The objdump
output will be a lot bigger, so we request disassembly of only the _start
symbol (function):
objdump -Mintel --disassemble=_start release/underrust | rustfilt
A trick
Earlier we tried to do an addition and a multiplication, but Rust pre-calculated the final result at compile time. What if we don’t want that?
We need to find a value that the compiler doesn’t know. We will use the trick from A random number you already have and use the address of a local variable on the stack.
let mut ret = 2;
ret += &ret as *const _ as u32;
ret *= 4;
and now we get something closer to our source code, here heavily annotated
; make space for a 4 byte (u32) local variable (ret) on the stack
; this is unnecessary
sub rsp,0x4
; move the address of `ret` (the stack pointer) into a register
mov rax,rsp
; the maths, still being clever
; rax*4 is &ret * 4
; 2*4 is still pre-computed to 8
lea esi,[rax*4+0x8]
; store `ret` in it's reserved space on the stack
; this is unnecessary
mov DWORD PTR [rsp],esi
; our `asm!`
mov rdi,rsi
mov eax,0x3c
syscall
ud2
This is quite typical. The compiler is sometimes amazing, sometimes wasteful, and always surprising. Here the Rust equivalent is something like this:
let mut ret: u32; // uninitialized
let ret_addr = &ret as *const _ as u32;
ret = ret_addr * 4 + 8;
return ret;
A different way to (partially) defeat compiler optimizations is to pass your value to black_box. As far as I can tell this uses the same trick of taking the stack address, but it does it twice.
Assembly primer
If you haven’t read much assembler recently there are much better tutorials, but here is the bare minimum to get you going.
Basic stuff
mov esi, 0x8
This translates to esi = 8
.
The first part (mov
) is always the instruction. You can look them up in Felix Cloutier’s fantastic reference.
After that are a variable number of arguments. Here, in Intel syntax (versus the much-less-legible AT&T syntax) the destination is on the left and is register esi
. x86 has eight general purpose registers which have different names for different parts (pictures).
The last part is the source, here the immediate value 8 expressed in hexadecimal.
Arithmetic
add eax, ebx
Arithmetic instructions are always the OpAssign version. This says eax += ebx
.
Addresses
mov BYTE PTR [rsp+0x1a],0x1
Local variables are, as you know, stored on the stack. rsp
is the stack pointer. Square brackets indicate a value used as an address, so this adds 26 (0x1a) to the address currently stored in the stack pointer, and writes the one byte sized value 1 there (destination is on the left). This corresponds to let x: u8 = 1;
.
Load Effective Address is weird but important
lea eax,[rsi+rdi*1]
Assembly has a very fast lea
(Load Effective Address) instruction designed to calculate an address. It can do wonderful things like calculating the address of the second field of the third element in an array, in a single CPU cycle, and because of this compilers use it for regular maths all the time.
This is the one big exception to the rule that square brackets indicate an address. Here they do not, it’s just doing maths.
This says eax = rsi + rdi
. The first two parameters to a function are passed in rdi
and rsi
, so this is probably the output for the second line here:
fn f(a: u32, b: u32) { // a is in rdi, b is in rsi
let x = a + b; // x is in eax
...
That’s it. In future posts we will use this template to explore the Underrust, the secret world beneath the bustling surface of Rust. Grab your scimitars. Let’s go!