Return Value Optimization in Rust

February 26, 2022 software rust

Summary

In the provided Rust program, the struct `St` is created in `sub2` but does not move from its original stack location. It ends up on the stack of `main`, demonstrating return value optimization. The output shows that the addresses for `St` remain consistent across the various function calls (`sub1` and `sub2`), indicating that `St` is directly placed in the memory allocated for `main` without being copied between stack frames. The assembly output reveals that the address for `St` is passed as a parameter, further confirming this optimization. Ultimately, Rust utilizes this technique to enhance efficiency.

Update Nov 2024: Return value optimization changed starting in Rust 1.71. See the last section for details.

In the program below where in memory is St created, and where does it end up? Or to ask it a different way, what does this program print?

struct St {
    a: u64,
    _b: u64,
    _c: u64,
}

fn main() {
    let v: u32 = 1;
    let x = sub1();
    println!("main {:p}, {:p}", &v, &x.a);
}

#[inline(never)]
fn sub1() -> St {
    let v: u32 = 1;
    let x = sub2();
    println!("sub1 {:p}, {:p}", &v, &x.a);
    x
}
#[inline(never)]
fn sub2() -> St {
    let v: u32 = 1;
    let x = St{ a: 42, _b: 0, _c: 0 };
    println!("sub2 {:p}, {:p}", &v, &x.a);
    x
}

Try it out on the Rust playground.

I assumed that St would be created on sub2’s stack. It would be copied from there into sub1’s stack, and then again into main’s stack, but that’s not what we see. The output looks like this:

sub2 0x7ffdac0fea04, 0x7ffdac0feb90
sub1 0x7ffdac0fead4, 0x7ffdac0feb90
main 0x7ffdac0feb8c, 0x7ffdac0feb90

The first address is the v variables which are clearly going to be on the local stack. We print their address to see where that is.

The curious and wonderful part is the second address, the address of St: it doesn’t move! It is created in it’s final resting place on main’s stack. Notice that it is bigger than main’s v address, so it is in main’s stack frame (remember that the stack grows downwards).

That is return value optimization. It is required by the C++ standard, so as far as I can tell Rust gets it from LLVM for free.

The process stack for the example above looks like this in memory (showing only the bottom four bytes of the address):

main
	0xeba0 St._c (u64: 8 bytes)
	0xeb98 St._b (8 bytes)
	0xeb90 St.a  (8 bytes)
	0xeb8c v     (u32: 4 bytes)
	.. lots of space for println!
sub1
	0xead4 v
	.. lots of space for println!
sub2
	0xea04 v

If you look at the assembly output it’s clear what’s happening. The address to store St in is passed down into sub1 and sub2 as a function parameter.

main:
	; rsp is the stack pointer, so [rsp + <num>] means something on local stack
	sub rsp, 200					; space on main's stack
	mov dword ptr [rsp + 44], 1		; the local variable 'v'
	; rdi gets the address, on main's stack, where x (struct St) will go
	lea rdi, [rsp + 48]				; lea = Load Effective Address
	call playground::sub1

playground::sub1:
	; .. stuff removed, but rdi doesn't change ..
	call playground::sub2

playground::sub2:
	sub rsp, 184					; most of this space is for println!
	mov rax, rdi					; I'm not sure ..
	mov qword ptr [rsp + 16], rax   ;   .. what this is for
	mov dword ptr [rsp + 52], 1		; let v: u32 = 1
	; rdi still has an address in main's stack, 0x7ffdac0feb90 in our earlier example
	mov qword ptr [rdi], 42			; x.a = 42
	mov qword ptr [rdi + 8], 0		; x._b = 0
	mov qword ptr [rdi + 16], 0		; x._c = 0

That’s intel assembly syntax. The target is on the left, source on right. The calling convention is to put the first function parameter in register rdi. Here none of the functions take a parameter, and yet rdi is being passed.

If you comment out the println! in sub2 and generate the assembly again (the playground can do this - select Intel assembly flavor under Config) sub2’s stack space goes down to exactly the 4 bytes needed for v (a u32). No space for St in there.

LLVM has transformed our code to this:

fn main() {
    let v: u32 = 1;

	// space for x on main's stack
	let x = St{};			// !! Uninitialized, not valid Rust
    sub1(&mut x);			// pass x's address so sub1/2 can fill it in

    println!("main {:p}, {:p}", &v, &x.a);
}

The reason this happens is because Rust loves you and you have a heart of gold.

Update: Rust 1.71 and beyond

The example above stopped working from Rust 1.71. Rust / LLVM still do return value optimization, but only if you don’t read the address. If you run the example above you will get a different address each time. Rust constructs the object on the local function’s stack, prints that address, then moves the value into the callers stack. This holds even if instead of printing the address you saved it in an array passed in, or in a global variable.

So how do I know it still works? We have to look at the assembly.

If you remove the println! statements from sub1 and sub2, and don’t store or otherwise access the address of x, you get some very nice return value optimization. This is 1.83-nightly --release profile:

0000000000013b40 <rto::main>:
   ; stuff removed
   lea    rbx,[rsp+0x68]                ; St will be constructed here
   mov    rdi,rbx
   call   13be0 <rto::sub1>
   ; more stuff removed

0000000000013be0 <rto::sub1>:           ; Nothing removed, this is the entire function!
   jmp    13bf0 <rto::sub2>             ;  💖

0000000000013bf0 <rto::sub2>:
   mov    QWORD PTR [rdi],0x2a          ; x.a = 42
   vxorps xmm0,xmm0,xmm0                ; Create 16 bytes of 0 (xmm0 is a 128 bit register)
   vmovups XMMWORD PTR [rdi+0x8],xmm0   ; x.b = 0 and x.c = 0 in one move
   ret                                  ; Jump directly back to main

Those two functions are only there because I annotated them with #[inline(never)] so we can see what’s going on. sub1 is lovely, it doesn’t even call sub2 (which would push the return address on the stack, requiring a ret to return). It jumps straight into sub2, which means when sub2 does ret, it pops main’s address and goes directly back there.

sub2 constructs St directly at the address in rdi, which is where main stored an address in it’s own stack. Exactly the return value optimization we were seeing in earlier Rust versions.

My colleague Geoffry told me what this optimization is called. Thank you.
Pedro from Brazil informed me that it stopped working at some point, which led me to adding the last section. Thank you.
I don't know for sure if it's LLVM doing this. I'm guessing based on C++ requiring it, and the fact that it happens in both Rust's debug and release modes.

Graham King

Update: Rust 1.71 and beyond