Rust is also C
Summary
Wouldn’t it be nice if C had some modern conveniences such as a hash map, a growable array, and maybe a sensible UTF-8 based string type? C++ without the baggage, and where it doesn’t take 260 pages to explain move semantics.
Don’t let all that Rust talk of safety cloud the language for you. You can almost directly translate from C into Rust. Here for example is selection sort from O’Reilly’s C in a Nutshell:
|
|
|
There are some immediately nice things in the Rust version.
- Line 29
println!
can print complex types. - Line 9 we don’t have to pass the length
n
because the slice reference&mut [u8]
is a fat pointer, a tuple of the address and the length. In C it feels like we’re always passing a pointer and a length, so Rust bundles them up. And.len()
will always be correct, whereas line 28int n = sizeof(input)
only works because we’re using a byte array.
There are even some straightforward safety features:
- Lines 14 and 15 in Rust show that
min_p
is going to change (it ismut
able) butlast
is an invariant. - Lines 13 and 23: Rust has different types for a pointer-sized number (
usize
, typically an alias foru64
) and a pointer (*mut u8
). You can’t de-reference a number and you can’t use arithmetic operators on pointers (you can compute an offset instead). You can easily cast between them (e.g.my_ptr as usize
).
The “C is portable assembler” idea also still works. Here is the swap
function for C on the left and Rust on the right. Aside from Rust’s symbol mangling it’s the same.
|
|
On line 13 we’re asking the slice (fixed size array) for it’s pointer with as_mut_ptr
. You can get a pointer from almost anything: String, Vector, slice (array), and many others, but maybe that feels too managed (it’s really not). Instead of asking nicely, let’s just take it. Earlier I said the slice reference &[u8]
is a fat pointer, a tuple of two numbers. We can just re-interpret it as that with transmute
. Change line 13 to this (and replace a.len()
with n
), and everything else stays the same:
let (first, n) = transmute::<&[u8], (usize,usize)>(a);
You can transmute a String or Vec to a three-tuple, the third number being it’s allocated capacity.
And you can go the other direction with from_raw_parts. You can allocate your own memory. You can have inline assembly. You can do all those things at the same time.
use core::arch::asm;
use std::mem::forget;
fn main() {
let mut s = unsafe {
let mem_start = allocate(4);
*mem_start = 65; // 'A'
String::from_raw_parts(mem_start, 1, 4) // Length is 1, capacity is 4
};
s.push_str("BCD");
println!("{s}");
forget(s); // Rust must not try to free the memory because it didn't allocate it
}
// Allocate num_bytes and return the address of region start
unsafe fn allocate(num_bytes: usize) -> *mut u8 {
// Find current break position
let current: usize;
asm!(
"mov rax, 12", // brk syscall
"mov rdi, 0", // 0 is an invalid position, we just care about output
"syscall",
out("rax") current, // syscall puts current brk position in rax
);
// Ask to move it up by num_bytes
asm!(
"mov rdi, {}", // ask to move the break to here
"add rdi, {}", // + this many bytes
"mov rax, 12",
"syscall",
in(reg) current,
in(reg) num_bytes,
);
current as *mut u8
}
Yes, we’re moving the program break to allocate memory, we’re putting some data there, and then telling Rust this is a partially populated String. It’s totally fine.
Don’t let the safety fun sponges tell you how to code. You can sing your own special song with pointer arithmetic and still have nice things like a print function that can handles arrays.
- You will need nightly rust for the
asm
macro. - The syscall syntax is Linux only.
- I built the C like this:
clang -march=native -O3 -o main main.c
- I built the Rust like this:
RUSTFLAGS="-C target-cpu=native" cargo build --release
- To get the assembly I added
#[inline(never)]
to Rustswap
function and ranobjdump -D -M intel <binary>
.
But seriously, are there any practical applications? This low-level power was always a primary goal of Rust, so very much yes:
- Talking to libraries that have a C ABI, doing low-level things in the standard library, or both.
- We have some extremely well battle tested C code. It’s sometimes safer to port it as is rather than rewrite it.
- Cryptography algorithms reference implementations are often C, and we want to port them as exactly as possible.
- Performance. Always performance.