Go: Slice search vs map lookup
Summary
tl;dr Use a map.
Computer science 101 tells us that maps are constant time access, O(1). But what is the constant? The map has to compute the hash, find the right bucket (array access), the right item within the bucket (another array access), and potentially do that multiple times as we walk a chain of buckets (if we overflowed the original bucket). At what point is it faster to iterate through an array comparing each item until we find the one we want?
Motivated by this comparison in C++ I decided to compare Go’s built-in map and slice types. The code is in a gist, a simple set of benchmark tests.
I tried two cases:
- first a traditional key-value setup comparing
map[string]string
with[]*Item{string,string}
. The break-even point here is just five items. Under that the slice is faster, above it is slower. - second a set of integers, comparing
map[int]struct{}
with[]int
. The break-even point is ten items.
These results are similar to the C++ results. They mean we won’t be doing clever hacks in the name of performance. I call that good news. Use the obvious data structure and it will also be the right choice for performance.
The gory details
Here are the numbers I get on my laptop running the gist. They are only relevant as relative values:
size | map str | slice str | map int | slice int |
---|---|---|---|---|
1 | 8ns | 6ns | 22ns | 6ns |
2 | 18ns | 13ns | 26ns | 10ns |
5 | 100ns | 100ns | 35ns | 30ns |
10 | 100ns | 230ns | 55ns | 45ns |
20 | 100ns | 500ns | 60ns | 100ns |
100 | 100ns | 2200ns | 75ns | 500ns |
I tested with Go’s array (the [10]int
things that we almost never use) but the results were similar to slice, so I didn’t include them. The map timings were very variable over the runs, up to 2x, so don’t read too much into that table. In the 1 or 2 item case the map occasionally beat the slice.
In the first, key-value case (map str vs slice str) I expect it’s the string comparison that costs us. Comparing strings is expensive. If we wanted to speed up the string comparison we would have to … hash the string!
Go’s map
hashes strings with CPU AES instructions, if your processor has them, which it probably does. cat /proc/cpuinfo | egrep 'aes|sse2|sse4_1'
will tell you. I assume using those instructions makes the hash function fast. Go also stores eight items per bucket, and has special-case access methods for string (mapaccess1_faststr
) and int cases.
Even with such a highly optimized map, I don’t understand why the int
(set) case is slower than the map at only 10 items. Here’s the relevant assembly for BenchmarkSetIntSlice
. Each loop is two integers comparisons and a memory fetch. 10 items means up to 10 memory loads and 20 integer comparison, that’s all:
if theArr[j] == q {
# bounds check. rax is `j`, rdi is len(theArr)
45b0b1: cmp %rdi,%rax
45b0b4: jae 45b104 <_access.BenchmarkSetIntSlice+0x144>
# Fetch `theArr[j]`
45b0b6: lea (%r9,%rax,8),%rbx
45b0ba: mov (%rbx),%rbx
# Compare the fetched value with `q` (rsi)
45b0bd: cmp %rsi,%rbx
45b0c0: jne 45b0c9 <_access.BenchmarkSetIntSlice+0x109>
found = true
# We found it! rdx is `found`
45b0c2: mov $0x1,%rdx
The map version is a little more involved, but only runs once. I can only assume that it’s the memory access (lea
, mov
), repeated ten times, that costs us. If you know different please let me know in the comments!