How much slower than C are modern compiled languages? With the plethora of programming languages at the world’s disposal, is there a large performance cost in using these languages in their most concise and idiomatic form? This experiment benchmarks a recursion algorithm in C, Rust, Nim and Go to try and get an hint of an answer to this popular question.
Prime number generator: Sieve of Eratosthenes
The algorithm chosen to conduct this comparison experiment is the sieve of Eratosthenes, which is a famous recursive algorithm for finding all prime numbers smaller than an integer N. The particular test used in our benchmark consists in generating all prime numbers smaller than 50,000 using this method and print to the terminal how many there are. To make the comparison between compilers as compelling as possible, we will focus on finding the most idiomatic yet optimised implementation we can come up with in each programming language. When push comes to shove, each of these languages offers a syntax than can replicate C code, but the aim here is to understand how much of an impact on performance writing concise and elegant code using the modern features of a language has.
To start it all off, lets have a look at the reference C implementation:
#include <stdint.h>
#include <stdio.h>
#include "vector.h"
Vector* all_natural_numbers(size_t n) {
Vector* pV = malloc( sizeof(Vector) );
pV->size = n-1;
pV->values = malloc( (n-1) * sizeof(uint32_t) );
for(size_t i = 2; i <= n; i++)
pV->values[i-2] = (uint32_t) i;
return pV;
}
void filter_numbers(uint32_t k, Vector* numbers) {
size_t new_size = 0;
for(size_t i = 0; i < numbers->size; i++) {
uint32_t val = vector_get(numbers,i);
if ((val % k != 0) || (val == k)) {
vector_update(numbers, new_size, val);
new_size++;
}
}
vector_resize(numbers, new_size);
}
Vector* sieve(uint32_t n) {
uint32_t k = 2;
Vector* numbers = all_natural_numbers(n);
filter_numbers(2, numbers);
for(;;) {
int found = 0;
for(size_t i = 0; i < numbers->size; i++) {
uint32_t nxt = vector_get(numbers,i);
if( nxt > k ) {
k = nxt;
filter_numbers(k, numbers);
found = 1;
break;
}
}
if ( found == 0 )
break;
}
return numbers;
}
int main() {
Vector* primes = sieve(50000);
printf("%zu\n", primes->size);
}
The algorithm starts by generating a sequence of all natural numbers between 2 and the upper bound number N. This is the job of the all_natural_numbers
function. Then, we filter out from the list all the numbers that are divisable by the number 2 (the first prime number). Then, find the first number greater than 2 in the list which is the next prime. If no numbers are found, the algorithm stops, otherwise, it filters again the list with this new prime number, and iterates through this process until the exit condition has been met. The function filter_numbers
performs the filtering at each iteration while a small loop finds the next prime number to consider.
This implementation makes use of a Vector
struct which is defined as follows:
#ifndef VECTOR_H
#define VECTOR_H
#include <stdint.h>
#include <stdlib.h>
typedef struct {
size_t size;
uint32_t* values;
} Vector;
void vector_update(Vector* vector, size_t index, uint32_t value);
void vector_resize(Vector* vector, size_t size);
uint32_t vector_get(Vector* vector, size_t index);
#endif
#include "vector.h"
#include <stdint.h>
#include <stdlib.h>
void vector_update(Vector* vector, size_t index, uint32_t value) {
vector->values[index] = value;
}
void vector_resize(Vector* vector, size_t size) {
vector->size = size;
vector->values = realloc( vector->values, size * sizeof( uint32_t ) );
}
uint32_t vector_get(Vector* vector, size_t index) {
return vector->values[index];
}
The main point of focus in C has been the limitation in the number of data structures used. This code initialises a single Vector
instance and modifies it at each recursive call through reference passing. This aspect should have the most direct influence on the final speed of computation. Next, our usual language of choice: Rust.
Rust version
The Rust implementation is a lot more concise and makes full use of the Iterator
trait of data structures . The filter_numbers
function uses the filter
method of iterators, while the next prime is selected using its find
method. All functions take u32
type parameters in order to match the uint32_t
in the C version. Here is the code:
fn filter_numbers(k: u32, numbers: Vec<u32>) -> Vec<u32> {
numbers.into_iter()
.filter(|&i| (i == k) || (i % k != 0))
.collect()
}
fn sieve(n: u32) -> Vec<u32> {
let mut k = 2u32;
let numbers : Vec<u32> = (2..(n+1)).into_iter().collect();
let mut res = filter_numbers(2, numbers);
loop{
match res.iter().find(|&&i| i > k) {
Some(val) => {
k = *val;
res = filter_numbers(k, res);
},
None => {
break;
}
}
}
res
}
fn main() {
println!("{:?}", sieve(50000).len());
}
With a more functional syntax, closer to OCaml than C++, the code is concise and yet very readable. We can easily identify the different steps of the algorithm. Next, the Nim implementation.
Nim version
Even though garbage collection seems to be its default use case, the Nim compiler is very flexible as the language supports reference passing. Our implementation will make full use of the sequtils
module, which allows for operations on sequences like map
or filter
. Doing so seems more natural for the language than exactly replicating the looping operations found in the C code. We had to make the function filter_numbers
had to accept a seq
by reference however to avoid too many time consuming copying operations. The consequence is that the code is not as clear and concise as it could have been. The final code stays however close to the Rust implementation.
import sequtils
proc filter_numbers(k: uint32, numbers: var seq[uint32]) =
var new_size = 0
for i,v in numbers:
if((v == k) or (v mod k != 0)):
numbers[new_size] = v
new_size += 1
numbers = numbers[..(new_size-1)]
proc sieve(n: uint32) : seq[uint32] =
var k : uint32 = 2
var numbers : seq[uint32] = toSeq(2..int(n)).mapIt(uint32(it))
filter_numbers(2, numbers)
while true:
var found = false
for nxt in numbers:
if nxt > k:
k = nxt
filter_numbers(k,numbers)
found = true
break
if not found:
break
result = numbers
when isMainModule:
echo(sieve(50000).len())
While we are dabbling with the modern garbage-collected but flexible languages, we can add Go as the last language in our comparison study.
Go version
The Go syntax being so inspired from C, it was difficult to decide on what idiomatic Go would look like in this instance. The language having a slice container with a length property, the decision was to use slice pointers to get close to the C implementation.
package main
import "fmt"
func all_natural_numbers(upper uint32) []uint32 {
var values []uint32
for k := 2; k <= int(upper); k++ {
values = append(values, uint32(k))
}
return values
}
func filter_numbers(k uint32, numbers *[]uint32) {
var new_size uint32
new_size = 0
for i := 0; i < len(*numbers); i++ {
val := (*numbers)[i]
if (val == k) || (val % k != 0) {
(*numbers)[new_size] = val
new_size++
}
}
(*numbers) = (*numbers)[:new_size]
}
func sieve(n uint32) []uint32 {
var k uint32 = 2
var numbers = all_natural_numbers(n)
filter_numbers(2, &numbers)
for {
var found = false
for _, nxt := range numbers {
if nxt > k {
k = nxt
filter_numbers(k, &numbers)
found = true
break
}
}
if !found {
break
}
}
return numbers
}
func main() {
primes := sieve(50000)
fmt.Println(len(primes))
}
Now that we have all our implementations ready, let’s compile and compare the results.
Benchmarking results
All testing has been performed on a MacPro OSX High Sierra Desktop running on two 2.4 GHz 6-Core Intel Xeon and 32 GB of RAM. Not the purest testing environment but this exercise is meant to be a rough guide and not a testing standard. All programs were compiled as release builds without any other optimisation options:
-
C:
cmake -DCMAKE_BUILD_TYPE=Release ..
-
Rust:
cargo build --release sieve
-
Nim:
nimble build -d:release sieve.nim
-
Go:
go build -buildmode=exe sieve.go
The hyperfine application was used to time the executable files generated. A warmup setting of 10 runs was used for all. Here are the results for runs of the executable in each language:
Surprisingly, C does not produce the fastest executable, Rust does. This is a very surprising result, that could only come in my view from the optimisation settings CMake uses by default. Rust is the clear winner in this test as it has the most concise implementation and the fastest runtime. Go is relatively fast also but the implementation is not as pleasing to read as it remains close to C. Nim’s concise code comes at a substantial performance cost. Are the iterator operations not as well optimised as in Rust? Hard to understand why it is lagging behind.
Conclusion
These modern compiled languages can be optimised to run at very comparable speed to C. If the open source communities behind these language write libraries with performance in mind, execution speeds similar to the old established low level language should be expected. So if performance is not the criterion to look into to pick one language over another, then what should it be? Well, it is probably more relevant to look at criteria like readability, the package manager, the community, interoperability, safety and so on. Nailing performance alone may not be enough to popularise adoption, there needs to be something fresh and appealing that gets brought to the table.
If you like this post, don’t forget to follow me on Twitter and get notified of the next publication.