Graham King

Solvitas perambulum

Rust is also C

software rust c

Wouldn’t it be nice if C had some modern conveniences such as a hash map, a growable array, and maybe a sensible UTF-8 based string type? C++ without the baggage, and where it doesn’t take 260 pages to explain move semantics.

Don’t let all that Rust talk of safety cloud the language for you. You can almost directly translate from C into Rust. Here for example is selection sort from O’Reilly’s C in a Nutshell:



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/* C */

#include <stdint.h>
#include <stdio.h>

void swap(uint8_t *p1, uint8_t *p2) {
	uint8_t tmp = *p1;
	*p1 = *p2;
	*p2 = tmp;
}
void selection_sort(uint8_t a[], int n) {
	if (n <= 1) {
		return;
	}
	uint8_t *p;
	uint8_t *min_p;
	uint8_t *last = a + n-1;
	for ( ; a < last; a++ ) {
		min_p = a;
		for ( p = a+1; p <= last; ++p ) {
			if (*p < *min_p) {
				min_p = p;
			}
		}
		swap(a, min_p);
	}
}
int main() {
	uint8_t input[] = {8, 43, 2, 5, 1};
	int n = sizeof(input);
	selection_sort(input, n);
	for (int i=0; i<n; i++) {
		printf("%d ", input[i]);
	}
	printf("\n");
}
/* Rust */

// No #include,  Rust auto-imports
// a standard 'prelude'

unsafe fn swap(p1: *mut u8, p2: *mut u8) {
    let tmp = *p1;
    *p1 = *p2;
    *p2 = tmp;
}
unsafe fn selection_sort(a: &mut[u8]) {
    if a.len() <= 1 {
        return;
    }
    let first = a.as_mut_ptr() as usize;
    let mut min_p: usize;
    let last = first + a.len()-1;
    for a in first..last {
        min_p = a;
        for p in a+1..=last {
            if *(p as *const u8) < *(min_p as *const u8) {
                min_p = p;
            }
        }
        swap(a as *mut u8, min_p as *mut u8);
    }
}
fn main() {
	let mut input = [8, 43, 2, 5, 1];
	unsafe { selection_sort(&mut input) };
    println!("{input:?}");
}

There are some immediately nice things in the Rust version.

  • Line 29 println! can print complex types.
  • Line 9 we don’t have to pass the length n because the slice reference &mut [u8] is a fat pointer, a tuple of the address and the length. In C it feels like we’re always passing a pointer and a length, so Rust bundles them up. And .len() will always be correct, whereas line 28 int n = sizeof(input) only works because we’re using a byte array.

There are even some straightforward safety features:

  • Lines 14 and 15 in Rust show that min_p is going to change (it is mutable) but last is an invariant.
  • Lines 13 and 23: Rust has different types for a pointer-sized number (usize, typically an alias for u64) and a pointer (*mut u8). You can’t de-reference a number and you can’t use arithmetic operators on pointers (you can compute an offset instead). You can easily cast between them (e.g. my_ptr as usize).

The “C is portable assembler” idea also still works. Here is the swap function for C on the left and Rust on the right. Aside from Rust’s symbol mangling it’s the same.

0000000000401140 <swap>:
 8a 07                   mov    al,BYTE PTR [rdi]
 8a 0e                   mov    cl,BYTE PTR [rsi]
 88 0f                   mov    BYTE PTR [rdi],cl
 88 06                   mov    BYTE PTR [rsi],al
 c3                      ret
 0f 1f 80 00 00 00 00    nop    DWORD PTR [rax+0x0]
0000000000007c90 <_ZN5crust4swap17h71b4dff6f39d45a9E>:
 8a 07                   mov    al,BYTE PTR [rdi]
 8a 0e                   mov    cl,BYTE PTR [rsi]
 88 0f                   mov    BYTE PTR [rdi],cl
 88 06                   mov    BYTE PTR [rsi],al
 c3                      ret
 0f 1f 80 00 00 00 00    nop    DWORD PTR [rax+0x0]

On line 13 we’re asking the slice (fixed size array) for it’s pointer with as_mut_ptr. You can get a pointer from almost anything: String, Vector, slice (array), and many others, but maybe that feels too managed (it’s really not). Instead of asking nicely, let’s just take it. Earlier I said the slice reference &[u8] is a fat pointer, a tuple of two numbers. We can just re-interpret it as that with transmute. Change line 13 to this (and replace a.len() with n), and everything else stays the same:

	let (first, n) = transmute::<&[u8], (usize,usize)>(a);

You can transmute a String or Vec to a three-tuple, the third number being it’s allocated capacity.

And you can go the other direction with from_raw_parts. You can allocate your own memory. You can have inline assembly. You can do all those things at the same time.

use core::arch::asm;
use std::mem::forget;

fn main() {
    let mut s = unsafe {
        let mem_start = allocate(4);
        *mem_start = 65; // 'A'
        String::from_raw_parts(mem_start, 1, 4) // Length is 1, capacity is 4
    };
    s.push_str("BCD");
    println!("{s}");
    forget(s); // Rust must not try to free the memory because it didn't allocate it
}

// Allocate num_bytes and return the address of region start
unsafe fn allocate(num_bytes: usize) -> *mut u8 {
    // Find current break position
    let current: usize;
    asm!(
        "mov rax, 12", // brk syscall
        "mov rdi, 0",  // 0 is an invalid position, we just care about output
        "syscall",
        out("rax") current, // syscall puts current brk position in rax
    );
    // Ask to move it up by num_bytes
    asm!(
        "mov rdi, {}", // ask to move the break to here
        "add rdi, {}", //  + this many bytes
        "mov rax, 12",
        "syscall",
        in(reg) current,
        in(reg) num_bytes,
    );
    current as *mut u8
}

Yes, we’re moving the program break to allocate memory, we’re putting some data there, and then telling Rust this is a partially populated String. It’s totally fine.

Don’t let the safety fun sponges tell you how to code. You can sing your own special song with pointer arithmetic and still have nice things like a print function that can handles arrays.

Make your own kind of music.


  • You will need nightly rust for the asm macro.
  • The syscall syntax is Linux only.
  • I built the C like this: clang -march=native -O3 -o main main.c
  • I built the Rust like this: RUSTFLAGS="-C target-cpu=native" cargo build --release
  • To get the assembly I added #[inline(never)] to Rust swap function and ran objdump -D -M intel <binary>.

But seriously, are there any practical applications? This low-level power was always a primary goal of Rust, so very much yes:

  1. Talking to libraries that have a C ABI, doing low-level things in the standard library, or both.
  2. We have some extremely well battle tested C code. It’s sometimes safer to port it as is rather than rewrite it.
  3. Cryptography algorithms reference implementations are often C, and we want to port them as exactly as possible.
  4. Performance. Always performance.