Rust Performance 101 in 5 Minutes
Summary
Is your Rust program CPU bound? Here are the very first things you can do on Linux.
First be sure the CPU really is the bottleneck. htop
should tell you, or that aircraft-taking-off fan noise.
Second isolate the part you are concerned about into a program (or unit test or benchmark) that you can run repeatably.
Third, get some ballpark measurements by running one of these at least four times:
time ./myprog
perf stat -e task-clock ./myprog
Build in release mode with link time optimization
Add this to your Cargo.toml
:
[profile.release]
lto = true
codegen-units = 1
then build it:
cargo build --release
This might be all you need. Release mode makes a huge difference. Then try only one or neither of those two profile.release
lines just in case they made things worse. Trust, but verify.
Compile for the target CPU
By default the Rust compiler will only use CPU instructions that even very old CPUs support, because it doesn’t know where you are going to run your program. If you are only going to run locally you can allow the compiler to use faster instructions:
RUSTFLAGS="-C target-cpu=native" cargo build --release
Here native CPU is an alias for “this machine”.
If you are running on a different machine than you are building on, but you know which machine, target that CPU. Find valid CPU names like this: rustc --target=x86_64-unknown-linux-gnu --print target-cpus
Aside: rustc
prints the CPU micro-architecture names like “Nehalem” and “Skylake”. To find yours: gcc -march=native -Q --help=target | grep march
.
Find the hotspot with a flamegraph
Install cargo flamegraph:
cargo install flamegraph
cargo flamegraph
chromium-browser flamegraph.svg
# or however you view SVG files
This will show you where your program spent time. That’s the part to optimize. Can you avoid doing that thing altogether? Or do it in a different way? Optimizing at the language level will only get you so far, the biggest wins are usually at the software design level.
Use a faster HashMap
Often the bottleneck will be in HashMap / HashSet. Here are three things you can do:
- Could you use an array instead? Even a quite large spare Vec is often much faster than a HashMap.
- Are your hash keys numbers? Try nohash-hasher.
- Otherwise try rustc-hash or AHash, both should be a fair bit faster than the standard library’s HashMap/HashSet.
nohash-hasher, rustc-hash and ahash are all almost drop-in replacements, requiring just a few character changes.
Use a faster library
If your bottleneck is in something relatively common (e.g. JSON parsing) there is often a faster library on crates.io. Take a look!
I think our five minutes is up. Happy tuning!
Appendix: Beyond 5 minutes
- Read the Rust performance book.
- Use the amazing perf, via perf one-liners.
- Cancel your summer plans. Read Brendan Gregg’s Systems Performance. Eight hundred pages later, you will know kung-fu.