Graham King

Solvitas perambulum

Go: Slice search vs map lookup

software go
Summary
In my comparison of Go's built-in map and slice types, I found distinct break-even points for their performance. For a traditional key-value setup (`map[string]string` vs `[]*Item{string,string}`), the slice was faster with fewer than five items, while the map outperformed it with five or more. In testing a set of integers (`map[int]struct{}` vs `[]int`), the slice was quicker with fewer than ten items, and the map took the lead at ten items. These findings align with previous C++ results, confirming that utilizing the obvious data structure often yields the best performance.

tl;dr Use a map.

Computer science 101 tells us that maps are constant time access, O(1). But what is the constant? The map has to compute the hash, find the right bucket (array access), the right item within the bucket (another array access), and potentially do that multiple times as we walk a chain of buckets (if we overflowed the original bucket). At what point is it faster to iterate through an array comparing each item until we find the one we want?

Motivated by this comparison in C++ I decided to compare Go’s built-in map and slice types. The code is in a gist, a simple set of benchmark tests.

I tried two cases:

  • first a traditional key-value setup comparing map[string]string with []*Item{string,string}. The break-even point here is just five items. Under that the slice is faster, above it is slower.
  • second a set of integers, comparing map[int]struct{} with []int. The break-even point is ten items.

These results are similar to the C++ results. They mean we won’t be doing clever hacks in the name of performance. I call that good news. Use the obvious data structure and it will also be the right choice for performance.

The gory details

Here are the numbers I get on my laptop running the gist. They are only relevant as relative values:

size map str slice str map int slice int
1 8ns 6ns 22ns 6ns
2 18ns 13ns 26ns 10ns
5 100ns 100ns 35ns 30ns
10 100ns 230ns 55ns 45ns
20 100ns 500ns 60ns 100ns
100 100ns 2200ns 75ns 500ns

I tested with Go’s array (the [10]int things that we almost never use) but the results were similar to slice, so I didn’t include them. The map timings were very variable over the runs, up to 2x, so don’t read too much into that table. In the 1 or 2 item case the map occasionally beat the slice.

In the first, key-value case (map str vs slice str) I expect it’s the string comparison that costs us. Comparing strings is expensive. If we wanted to speed up the string comparison we would have to … hash the string!

Go’s map hashes strings with CPU AES instructions, if your processor has them, which it probably does. cat /proc/cpuinfo | egrep 'aes|sse2|sse4_1' will tell you. I assume using those instructions makes the hash function fast. Go also stores eight items per bucket, and has special-case access methods for string (mapaccess1_faststr) and int cases.

Even with such a highly optimized map, I don’t understand why the int (set) case is slower than the map at only 10 items. Here’s the relevant assembly for BenchmarkSetIntSlice. Each loop is two integers comparisons and a memory fetch. 10 items means up to 10 memory loads and 20 integer comparison, that’s all:

   if theArr[j] == q {
             # bounds check. rax is `j`, rdi is len(theArr)
  45b0b1:    cmp    %rdi,%rax
  45b0b4:    jae    45b104 <_access.BenchmarkSetIntSlice+0x144>
             # Fetch `theArr[j]`
  45b0b6:    lea    (%r9,%rax,8),%rbx
  45b0ba:    mov    (%rbx),%rbx
             # Compare the fetched value with `q` (rsi)
  45b0bd:    cmp    %rsi,%rbx
  45b0c0:    jne    45b0c9 <_access.BenchmarkSetIntSlice+0x109>
   found = true
             # We found it! rdx is `found`
  45b0c2:    mov    $0x1,%rdx

The map version is a little more involved, but only runs once. I can only assume that it’s the memory access (lea, mov), repeated ten times, that costs us. If you know different please let me know in the comments!