Underrust: How does u128 work on a 64-bit processor?
Summary
In the shadow-haunted depths of the Underrust, where twisted caverns echoed with the drip of ancient water and the gleam of forgotten ores, Developer pressed onward. His hand gripped the hilt of a mystical u128 sword - an instrument of arcane power wrought in the art of modern smithing. The flicker of phosphorescent fungi revealed a figure seated upon a rock, his many legs clicking softly in measured cadence. This was Compiler Crab, a wizened sage whose shell bore the etched runes marking him as a member of the Assembly.
Developer: Wise Compiler Crab, I seek to unravel the mysteries of my u128 sword. How can it wield such power on a realm ruled by 64-bit registers?
Compiler Crab: Ah, intrepid Developer, you stand at the crossroads of ancient sorcery and modern ingenuity. Yes the x86-64 processor has no general purpose registers greater than 64 bits. But behold! Even a modest u128, though forged in 128 bits, will split its might into two 64-bit registers. Consider this Rust incantation:
let x: u128 = 4;
Which, when translated by the arcane compiler, manifests as:
mov QWORD PTR [rsp+0x8],0x0 # upper 64 bits
mov QWORD PTR [rsp],0x4 # lower 64 bits
Developer: I see. Is that just for small values?
Compiler Crab: Indeed not. When you brandish a more formidable enchantment, the compiler divides the magic into upper and lower halves. Witness this powerful rune:
let a_value: u128 = 0xABCD_0000_FFFF_0000_FFFF_0000_FFFF_0000;
It transmutes into the following runes carved into the stone of machine code:
# Upper half
14fcf: 48 be 00 00 ff ff 00 movabs rsi,0xabcd0000ffff0000
14fd9: 48 89 75 c8 mov QWORD PTR [rbp-0x38],rsi
# Lower half
14fdd: 48 b8 00 00 ff ff 00 movabs rax,0xffff0000ffff0000
14fe7: 48 89 45 c0 mov QWORD PTR [rbp-0x40],rax
Developer: OK, that makes sense. What about in a function?
Compiler Crab: Ah, brave Developer, observe a deeper enchantment - a function that multiplies the essence of your blade and then binds it with an arcane mask:
fn f(x: u128) -> u128 {
let y = x * 3;
y & 0xABCD_0000_FFFF_0000_FFFF_0000_FFFF_0000
}
Transfiguring this spell, the compiler conjures these runes:
# x is passed in rdi and rsi
15114: b8 03 00 00 00 mov eax,0x3
# x's lower half to rdx
15119: 48 89 fa mov rdx,rdi
# let y = x * 3
# Multiply rdx (implicit operand) by rax (3rd operand), and store result in rax:rcx (high:low)
# This is the bottom 64-bits of y = x * 3
1511c: c4 e2 f3 f6 c0 mulx rax,rcx,rax
# Now multiply the top half by three by setting x = x + x*2
15121: 48 8d 34 76 lea rsi,[rsi+rsi*2]
# Add any overflow from the bottom half multiplication into the top half
15125: 48 01 c6 add rsi,rax
# y & 0xABCD_0000_FFFF_0000_FFFF_0000_FFFF_0000
15128: 48 b8 00 00 ff ff 00 movabs rax,0xffff0000ffff0000
15132: 48 21 c8 and rax,rcx # bottom half
15135: 48 ba 00 00 ff ff 00 movabs rdx,0xabcd0000ffff0000
1513f: 48 21 f2 and rdx,rsi # top half
Developer: But, euh, haven’t processors had 128-bit registers for over twenty years, since SSE?
Compiler Crab:
I see ye knowest some of the Underrust’s subtler secrets. Beyond the duality of registers, ancient x86 processors bestow upon us eight 128-bit registers, the fabled xmm0
to xmm7
. Though these are intended for vector sorcery, and regular incantations may not be used upon them, they sometimes lend their might to our endeavors. Consider the simplest of spells:
let x: u128 = 0;
It then becomes:
14fcf: c5 f8 57 c0 vxorps xmm0,xmm0,xmm0 # xmm0 = 0
14fd3: c5 f8 29 45 b0 vmovaps XMMWORD PTR [rbp-0x50],xmm0 # move xmm0 into x (on the stack)
Developer:
I think I get it. That vxorps
is xor
for xmm registers, it sets the whole register to zero in a single operation.
Most of the instructions applicable to xmm
registers are for SIMD, for example using the 128-bits as four packed 32-bit values. SSE does stand for “Streaming SIMD Extensions” after all. The instructions applicable directly to 128-bit numbers are rare. 128-bit values themselves cannot be used packed in AVX-256 or AVX-512 instructions.
Is that what you meant by “vector sorcery”?
Compiler Crab: Precisely. And mark well—sometimes the compiler, in its sagacity, perceives that your value need not summon the full majesty of 128 bits. Should the magic fit snugly within 32 or 64 bits, it will employ those lesser vessels instead.
Developer: OK Compiler Crab. And euh, what of the, euh i128 sword, the signed kin of our u128? Does it not bear similar enchantments?
Compiler Crab:
It does, dear Developer. The i128 follows the same ancient paths, differing only in its arithmetic incantations, for example where mul
is replaced by the signed imul
. The nature of its magic, whether dark or light, is but our interpretation of its binary essence.
Developer: (Getting into the spirit of the conversation now) Truly, these secrets elevate my understanding. Pray, tell me - how did you come by such profound knowledge of these arcane runes?
Compiler Crab:
My studies were conducted with the latest grimoires of Rust - version 1.87-nightly - invoked with the sacred banner RUST_FLAGS="-Ctarget-cpu=native"
, upon the very heart of an Intel Core i7, a descendant of the Tigerlake lineage. These experiments, woven into the tapestry of nearly every x86-64 system, confirm the veracity of these ancient truths.
Developer: Gratitude, wise Compiler Crab. My euh, u128 sword now sings with the power of, as you say, both ancient and modern magic.
Compiler Crab: Go forth, Developer, and let the echoes of our dialogue resonate through the caverns of destiny.
And so, in the dim light of the Underrust, the Developer set off.
With thanks to chatgpt-4o-latest for helping me with the words