Graham King

Solvitas perambulum

Return Value Optimization in Rust

software rust

In the program below where in memory is St created, and where does it end up? Or to ask it a different way, what does this program print?

struct St {
    a: u64,
    _b: u64,
    _c: u64,

fn main() {
    let v: u32 = 1;
    let x = sub1();
    println!("main {:p}, {:p}", &v, &x.a);

fn sub1() -> St {
    let v: u32 = 1;
    let x = sub2();
    println!("sub1 {:p}, {:p}", &v, &x.a);
fn sub2() -> St {
    let v: u32 = 1;
    let x = St{ a: 42, _b: 0, _c: 0 };
    println!("sub2 {:p}, {:p}", &v, &x.a);

Try it out on the Rust playground.

I assumed that St would be created on sub2’s stack. It would be copied from there into sub1’s stack, and then again into main’s stack, but that’s not what we see. The output looks like this:

sub2 0x7ffdac0fea04, 0x7ffdac0feb90
sub1 0x7ffdac0fead4, 0x7ffdac0feb90
main 0x7ffdac0feb8c, 0x7ffdac0feb90

The first address is the v variables which are clearly going to be on the local stack. We print their address to see where that is.

The curious and wonderful part is the second address, the address of St: it doesn’t move! It is created in it’s final resting place on main’s stack. Notice that it is bigger than main’s v address, so it is in main’s stack frame (remember that the stack grows downwards).

That is return value optimization. It is required by the C++ standard, so as far as I can tell Rust gets it from LLVM for free.

The process stack for the example above looks like this in memory (showing only the bottom four bytes of the address):

	0xeba0 St._c (u64: 8 bytes)
	0xeb98 St._b (8 bytes)
	0xeb90 St.a  (8 bytes)
	0xeb8c v     (u32: 4 bytes)
	.. lots of space for println!
	0xead4 v
	.. lots of space for println!
	0xea04 v

If you look at the assembly output it’s clear what’s happening. The address to store St in is passed down into sub1 and sub2 as a function parameter.

	; rsp is the stack pointer, so [rsp + <num>] means something on local stack
	sub rsp, 200					; space on main's stack
	mov dword ptr [rsp + 44], 1		; the local variable 'v'
	; rdi gets the address, on main's stack, where x (struct St) will go
	lea rdi, [rsp + 48]				; lea = Load Effective Address
	call playground::sub1

	; .. stuff removed, but rdi doesn't change ..
	call playground::sub2

	sub rsp, 184					; most of this space is for println!
	mov rax, rdi					; I'm not sure ..
	mov qword ptr [rsp + 16], rax   ;   .. what this is for
	mov dword ptr [rsp + 52], 1		; let v: u32 = 1
	; rdi still has an address in main's stack, 0x7ffdac0feb90 in our earlier example
	mov qword ptr [rdi], 42			; x.a = 42
	mov qword ptr [rdi + 8], 0		; x._b = 0
	mov qword ptr [rdi + 16], 0		; x._c = 0

That’s intel assembly syntax. The target is on the left, source on right. The calling convention is to put the first function parameter in register rdi. Here none of the functions take a parameter, and yet rdi is being passed.

If you comment out the println! in sub2 and generate the assembly again (the playground can do this - select Intel assembly flavor under Config) sub2’s stack space goes down to exactly the 4 bytes needed for v (a u32). No space for St in there.

LLVM has transformed our code to this:

fn main() {
    let v: u32 = 1;

	// space for x on main's stack
	let x = St{};			// !! Uninitialized, not valid Rust
    sub1(&mut x);			// pass x's address so sub1/2 can fill it in

    println!("main {:p}, {:p}", &v, &x.a);

The reason this happens is because Rust loves you and you have a heart of gold.

My colleague Geoffry told me what this optimization is called. Thank you.
I don't know for sure if it's LLVM doing this. I'm guessing based on C++ requiring it, and the fact that it happens in both Rust's debug and release modes.