Sieve of Eratosthenes: Benchmarking Rust, Nim, Go against C

How much slower than C are modern compiled languages? With the plethora of programming languages at the world's disposal, is there a large performance cost in using these languages in their most concise and idiomatic form? This experiment benchmarks a recursion algorithm in C, Rust, Nim and Go to try and get an hint of an answer to this popular question.

Prime number generator: Sieve of Eratosthenes

The algorithm chosen to conduct this comparison experiment is the sieve of Eratosthenes, which is a famous recursive algorithm for finding all prime numbers smaller than an integer N. The particular test used in our benchmark consists in generating all prime numbers smaller than 50,000 using this method and print to the terminal how many there are. To make the comparison between compilers as compelling as possible, we will focus on finding the most idiomatic yet optimised implementation we can come up with in each programming language. When push comes to shove, each of these languages offers a syntax than can replicate C code, but the aim here is to understand how much of an impact on performance writing concise and elegant code using the modern features of a language has.

To start it all off, lets have a look at the reference C implementation:

#include <stdio.h>
#include <stdlib.h>

void filter_numbers(uint32_t k, uint32_t* v, uint32_t* size) {
    uint32_t idx = 0;
    for (uint32_t i = 0; i < (*size); ++i)
        if ((v[i] % k != 0) || (v[i] == k)) {
            v[idx] = v[i];
            idx++;
        }
    (*size) = idx;
}

void sieve(uint32_t* v, uint32_t* size) {
    uint32_t idx = 0;
    while (idx < (*size)) {
        filter_numbers(v[idx], v, size);
        idx++;
    }
}

void print_vector(uint32_t* v, uint32_t size) {
    for (int i = 0; i < size; ++i)
        if ( i == 0 ) 
            printf("[%d ", v[i]);
        else if ( i == size - 1 )
            printf("%d]", v[i]);
        else
            printf("%d ", v[i]);
    printf("\n");
}

int main()
{
    uint32_t upper = 50000;
    uint32_t size = upper - 1;
    uint32_t* numbers = malloc(size * sizeof(uint32_t));

    for(uint32_t i = 0; i < size; ++i)
        numbers[i] = i + 2;

    sieve(numbers, &size);

    print_vector(numbers, size);

    free(numbers);

    return 0;
}

The algorithm starts by generating a sequence of all natural numbers between 2 and the upper bound number N inclusive. Then, considering a number of choice, it retains from the list all numbers that are not divisable by it or that are equal to it. This process is run iteratively starting with 2 and moving to the next number that survived the filtering process up until all numbers in the list have been considered.

The C Implementation makes great use of pointers avoiding any unnecessary memory allocation that would be detrimental to runtime performance. Next, our usual language of choice: Rust.

Rust version

The Rust implementation is very concise and makes full use of the Iterator trait of data structures . The filter_numbers function uses the retain method for in-place filtering of Vec structures. All functions take u32 type parameters in order to match the uint32_t in the C version. Here is the code:

fn filter_numbers(k: u32, numbers: &mut Vec<u32>) {
    numbers.retain(|&v| (v == k) || (v % k != 0));
}

fn sieve(upper: u32) -> Vec<u32> {
    let mut numbers = (2..(upper + 1)).collect::<Vec<u32>>();
    let mut idx = 0;
    while idx < numbers.len() {
        filter_numbers( numbers[idx], &mut numbers );
        idx += 1;
    }
    numbers
}

fn main() {
    println!("{:?}", sieve(50000));
}

With a more functional syntax, closer to OCaml than C++, the code is concise and yet very readable with the different steps of the algorithm remaining easily identifiable. Next, the Nim implementation.

Nim version

Even though garbage collection seems to be its default use case, the Nim compiler is very flexible as the language supports reference passing. Our implementation will make full use of the sequtils module to manipulate lists of homogeneous data with variable length. Doing so seems more natural for the language than exactly replicating the looping operations found in the C code. We had to make the function filter_numbers had to accept a seq by reference avoiding costly copying operations.

import std/sequtils

proc filter_numbers(k: uint32, numbers: var seq[uint32]) =
    var idx = 0
    for v in numbers:
        if (v == k) or (v mod k != 0):
            numbers[idx] = v
            inc idx
    numbers.setLen(idx)

proc sieve(upper: int) : seq[uint32] =
    var numbers : seq[uint32] = toSeq(2..upper).mapIt(uint32(it))
    var idx = 0
    while idx < numbers.len():
        numbers[idx].filter_numbers( numbers )
        inc idx
    numbers

when isMainModule:
    echo sieve(50000)

As a side note, I have also tried this version of the filter_number function which was more appropriate for the language as it made full use of the keepIf function in the Standard Library:

import std/sugar

proc filter_numbers(k: uint32, numbers: var seq[uint32]) =
    keepIf(numbers, v => (v == k) or (v mod k != 0))

However, it came at a 10% performance cost even though it is performing the same high-level operations as the explicit version. While we are dabbling with the modern garbage-collected but flexible languages, we can add Go as the last language in our comparison study.

Go version

The Go syntax being so inspired from C, it was difficult to decide on what idiomatic Go would look like in this instance. The language having a slice container with a length property, the decision was to use slice pointers to get close to the C implementation.

package main

import "fmt"

func all_natural_numbers(upper uint32) []uint32 {
    var values []uint32
    for k := 2; k <= int(upper); k++ {
        values = append(values, uint32(k))
    }
    return values
}

func filter_numbers(k uint32, numbers *[]uint32) {
    var idx uint32 = 0
    for i := 0; i < len(*numbers); i++ {
        val := (*numbers)[i]
        if (val == k) || (val % k != 0) {
            (*numbers)[idx] = val
            idx++
        }
    }
    (*numbers) = (*numbers)[:idx]
}

func sieve(n uint32) []uint32 {
    var numbers = all_natural_numbers(n)
    var idx int = 0
    for idx < len(numbers) {
        filter_numbers(numbers[idx], &numbers)
        idx++
    }
    return numbers
}

func main() {
    fmt.Println(sieve(50000))
}

Now that we have all our implementations ready, let's compile and compare the results.

Benchmarking results

All testing has been performed on a MacPro OSX High Sierra Desktop running on two 2.4 GHz 6-Core Intel Xeon and 16GB of RAM. Not the purest testing environment but this exercise is meant to be a rough guide and not a testing standard. All programs were compiled as release builds without any other optimisation options:

C: cmake -DCMAKE_BUILD_TYPE=Release ..
Rust: cargo build --release sieve
Nim: nimble build -d:danger sieve.nim
Go: go build -buildmode=exe sieve.go

The hyperfine application was used to time the executable files generated. A warmup setting of 10 runs was used for all. Here are the results for runs of the executable in each language:

Execution Time	Mean(ms)	StdDev(ms)	#Runs
C	115.2	2.0	21
Rust	114.2	1.8	25
Nim	114.3	2.5	25
Go	137.6	1.5	21

Surprisingly, both Rust and Nim (with extreme optimisation) match C's runtime performance. Rust has the edge in my view since its fastest implementation is its most concise while keeping memory safety features engaged. The equivalent Nim version did not use the most advanced features of the language and the compiler was forced into aggressive optimisation. Go was relatively fast also but its code syntax is less pleasant to read as it remains close to C.

Conclusion

These modern compiled languages can be optimised to run at very comparable speed to C. If the open source communities behind these language write libraries with performance in mind, execution speeds similar to the old established low level language should be expected. So if performance is not the criterion to look into to pick one language over another, then what should it be? Well, it is probably more relevant to look at criteria like readability, the package manager, the community, interoperability, safety and so on. Nailing performance alone may not be enough to popularise adoption, there needs to be something fresh and appealing that gets brought to the table.

If you like this post, don't forget to follow me on Twitter and get notified of the next publication.