Underrust: Multiple Return Values
Summary
Part of the Underrust series.
How does Rust return values, and does it make any difference to us programmers? The ABI defines both how to pass values to a function and how to return values. Let’s investigate.
One or two integers: rax and rdx
The most normal case is a single integer return value (which includes pointers). That goes in rax
. If you have a second value, or the first value is bigger than 64 bits, that goes in rdx
. And this is what we see.
Rust
|
Assembly
|
If ret
returned a u128
we would see the upper 64 bits in rdx and the lower in rax.
Returning a struct
follows the same rules as returning individual values because at the assembly level those two things are identical. This will generate the exact same assembly as the function above.
Rust
struct Obj {
x: u32,
y: u32,
}
fn ret(a: u32) -> Obj {
Obj { x: a, y: a }
}
Indeed if you put had both ret
functions in the same program like this
fn ret1(u32) -> (u32, u32)
fn ret2(u32) -> Obj
then LLVM will output a single function and call it twice.
Returning a function returns it’s address in rax
as a function pointer. The caller then calls it via the register: call rax
.
One or two floats: xmm0 and xmm1
Returning floating point values uses xmm0
and xmm1
.
Rust
|
Assembly
|
That unpronounceable vcvtusi2sd converts our u32
param to an f64
return value. It is using a 128 bit register because that’s the ABI. vmovpad is simply mov
for SSE/AVX registers.
Three or more: The caller’s stack
Beyond two values we use the caller’s stack. Our first parameter (rdi
) becomes the address to write the return values and a
is now in the second parameter (rsi
).
Rust
|
Assembly
|
In the general case it goes on like this forever, more values on the stack.
If you return an object (struct
) it’s the same, as we saw earlier. The struct is returned as it’s component values on the stack. From the assemblers point of view there is no such thing as a struct.
If you have a chain of function calls and a return value that gets passed straight back up you will see return value optimization. If your call chain goes a
-> b
-> c
, and b returns the output of c, then instead of c writing the values into b’s stack, and then b copying them to a’s stack, c
will write them directly to a
’s stack, eliding a copy.
Aside: An elegant optimization
In the example I’m using I am returning the same value multiple times. SIMD instructions are really good at working with the same value multiple times. Hence when we go to four return values LLVM does something really elegant. It packs the values into a larger AVX register and does a single write to the stack. It’s an unusual case but it’s pleasing to look at, so here it is.
Rust
|
Assembly
|
and the caller picks the 32-bit words off like this
Assembly
lea rdi,[rsp+0x8] ; where to write the return values
call 1030 underrust::ret
mov esi,DWORD PTR [rsp+0xc] ; second return value
add esi,DWORD PTR [rsp+0x8] ; first return value (0xc - 8 = 4)
Conclusions
Here’s what I learnt:
- If you return one or two primitive values they will be in registers making them zero cost.
- If you return anything beyond that you will use stack memory. That’s most likely L1 cache, 3-5 cycles per read/write, so still very fast but no longer free.
- If your function is inlined none of this matters.
And compiler optimizations are endlessly fascinating.
Thanks for reading. Have some energy: Neon Hearts.