Graham King

Solvitas perambulum

Underrust: How does u128 work on a 64-bit processor?

Summary
It's two 64-bit registers, okay.

In the shadow-haunted depths of the Underrust, where twisted caverns echoed with the drip of ancient water and the gleam of forgotten ores, Developer pressed onward. His hand gripped the hilt of a mystical u128 sword - an instrument of arcane power wrought in the art of modern smithing. The flicker of phosphorescent fungi revealed a figure seated upon a rock, his many legs clicking softly in measured cadence. This was Compiler Crab, a wizened sage whose shell bore the etched runes marking him as a member of the Assembly.

Developer: Wise Compiler Crab, I seek to unravel the mysteries of my u128 sword. How can it wield such power on a realm ruled by 64-bit registers?

Compiler Crab: Ah, intrepid Developer, you stand at the crossroads of ancient sorcery and modern ingenuity. Yes the x86-64 processor has no general purpose registers greater than 64 bits. But behold! Even a modest u128, though forged in 128 bits, will split its might into two 64-bit registers. Consider this Rust incantation:

    let x: u128 = 4;

Which, when translated by the arcane compiler, manifests as:

    mov    QWORD PTR [rsp+0x8],0x0  # upper 64 bits
    mov    QWORD PTR [rsp],0x4      # lower 64 bits

Developer: I see. Is that just for small values?

Compiler Crab: Indeed not. When you brandish a more formidable enchantment, the compiler divides the magic into upper and lower halves. Witness this powerful rune:

    let a_value: u128 = 0xABCD_0000_FFFF_0000_FFFF_0000_FFFF_0000;

It transmutes into the following runes carved into the stone of machine code:

   # Upper half
   14fcf:	48 be 00 00 ff ff 00 	movabs rsi,0xabcd0000ffff0000
   14fd9:	48 89 75 c8          	mov    QWORD PTR [rbp-0x38],rsi
   # Lower half
   14fdd:	48 b8 00 00 ff ff 00 	movabs rax,0xffff0000ffff0000
   14fe7:	48 89 45 c0          	mov    QWORD PTR [rbp-0x40],rax

Developer: OK, that makes sense. What about in a function?

Compiler Crab: Ah, brave Developer, observe a deeper enchantment - a function that multiplies the essence of your blade and then binds it with an arcane mask:

fn f(x: u128) -> u128 {
    let y = x * 3;
    y & 0xABCD_0000_FFFF_0000_FFFF_0000_FFFF_0000
}

Transfiguring this spell, the compiler conjures these runes:

   # x is passed in rdi and rsi

   15114:	b8 03 00 00 00       	mov    eax,0x3
   # x's lower half to rdx
   15119:	48 89 fa             	mov    rdx,rdi

   # let y = x * 3

   # Multiply rdx (implicit operand) by rax (3rd operand), and store result in rax:rcx (high:low)
   # This is the bottom 64-bits of y = x * 3
   1511c:	c4 e2 f3 f6 c0       	mulx   rax,rcx,rax

   # Now multiply the top half by three by setting x = x + x*2
   15121:	48 8d 34 76          	lea    rsi,[rsi+rsi*2]

   # Add any overflow from the bottom half multiplication into the top half
   15125:	48 01 c6             	add    rsi,rax

   # y & 0xABCD_0000_FFFF_0000_FFFF_0000_FFFF_0000

   15128:	48 b8 00 00 ff ff 00 	movabs rax,0xffff0000ffff0000
   15132:	48 21 c8             	and    rax,rcx  # bottom half
   15135:	48 ba 00 00 ff ff 00 	movabs rdx,0xabcd0000ffff0000
   1513f:	48 21 f2             	and    rdx,rsi  # top half

Developer: But, euh, haven’t processors had 128-bit registers for over twenty years, since SSE?

Compiler Crab: I see ye knowest some of the Underrust’s subtler secrets. Beyond the duality of registers, ancient x86 processors bestow upon us eight 128-bit registers, the fabled xmm0 to xmm7. Though these are intended for vector sorcery, and regular incantations may not be used upon them, they sometimes lend their might to our endeavors. Consider the simplest of spells:

    let x: u128 = 0;

It then becomes:

   14fcf:	c5 f8 57 c0          	vxorps xmm0,xmm0,xmm0                   # xmm0 = 0
   14fd3:	c5 f8 29 45 b0       	vmovaps XMMWORD PTR [rbp-0x50],xmm0     # move xmm0 into x (on the stack)

Developer: I think I get it. That vxorps is xor for xmm registers, it sets the whole register to zero in a single operation.

Most of the instructions applicable to xmm registers are for SIMD, for example using the 128-bits as four packed 32-bit values. SSE does stand for “Streaming SIMD Extensions” after all. The instructions applicable directly to 128-bit numbers are rare. 128-bit values themselves cannot be used packed in AVX-256 or AVX-512 instructions.

Is that what you meant by “vector sorcery”?

Compiler Crab: Precisely. And mark well—sometimes the compiler, in its sagacity, perceives that your value need not summon the full majesty of 128 bits. Should the magic fit snugly within 32 or 64 bits, it will employ those lesser vessels instead.

Developer: OK Compiler Crab. And euh, what of the, euh i128 sword, the signed kin of our u128? Does it not bear similar enchantments?

Compiler Crab: It does, dear Developer. The i128 follows the same ancient paths, differing only in its arithmetic incantations, for example where mul is replaced by the signed imul. The nature of its magic, whether dark or light, is but our interpretation of its binary essence.

Developer: (Getting into the spirit of the conversation now) Truly, these secrets elevate my understanding. Pray, tell me - how did you come by such profound knowledge of these arcane runes?

Compiler Crab: My studies were conducted with the latest grimoires of Rust - version 1.87-nightly - invoked with the sacred banner RUST_FLAGS="-Ctarget-cpu=native", upon the very heart of an Intel Core i7, a descendant of the Tigerlake lineage. These experiments, woven into the tapestry of nearly every x86-64 system, confirm the veracity of these ancient truths.

Developer: Gratitude, wise Compiler Crab. My euh, u128 sword now sings with the power of, as you say, both ancient and modern magic.

Compiler Crab: Go forth, Developer, and let the echoes of our dialogue resonate through the caverns of destiny.

And so, in the dim light of the Underrust, the Developer set off.


With thanks to chatgpt-4o-latest for helping me with the words