Google’s Multi-Language Benchmark in F#

Recently, Robert Hundt of Google published a paper titled “Loop Recognition in C++/Java/Go/Scala”

The paper compares various aspects (run-time performance, code size, etc.)  of the implementation of the same algorithm in different languages. See the paper for a detailed analysis. The final (best) performance results for the various languages are:

Benchmark Time [sec]
C++      : 23
Java     : 89
Scala    : 58
Go       : 126

Update: See this post for F# performance on newer hardware and .Net versions.

I ported the elegantly written Scala code by Google’s Daniel Mahler to F# and then further optimized it with the help of Visual Studio 2010 Profiler (which makes profiling almost effortless) to achieve:

F# Performance (updated)
31 sec on Core Duo 2.00 Ghz
20 sec on Core i5

Updated: Independent verification of results here.

While the results are not directly comparable because of the different hardware/OS used, it does show that F# (and by subsumption .Net / C#) offer comparable performance to Java and Scala.

The F# binary and source can be downloaded from http://loopfinder.codeplex.com/

(update: To get the source go to “View all downloads” at the above link)

The equivalent Scala code is here.

Two significant F# optimizations done are:

  • “for” loops are used in performance critical regions (instead of Seq.iter, or Seq.fold)
  • Object.Equals method was used for equality / inequality testing instead of “=” and “<>” operators

As with Scala, the F# version uses the mutable platform collections (List<> and HashSet<>) instead of F# immutable collections.

The source code size for F# and Scala is about the same. Scala has a “short cut” lambda syntax that make for very concise code when writing simple lambda expressions.

Note: Subsequent to the paper, another team optimized Go and C++ code to run in less than 4 seconds however the optimizations were very specific to the benchmark and cannot be applied generally.

update:

Here is the blog post that talks about C++ and Go improvements : http://blog.golang.org/2011/06/profiling-go-programs.html

It has links to the C++ source which is here: http://code.google.com/p/benchgraffiti/source/browse/havlak/havlak6.cc

Reading this post I get the impression that when we have tight loops then even a little bit of allocation makes a big difference.

The main improvements to go and c++ are the reduction in garbage by reusing cached objects when inside tight loops.

The way the benchmark is written it first constructs a graph (of about 20K nodes) and then finds loops in it 50 times.

For each loop-find iteration most implementations generate some garbage. The benchmark is very sensitive to the amount of garbage generated in each iteration.

I think a better benchmark would have been to generate the graph and find the loops in it once and then repeat this cycle 50 times without caching anything across cycles. Recreating the graph and other objects for each cycle is a more realistic benchmark.

The only sensible conclusion we can draw is that F# on Windows seems to be faster than Scala on Linux because the F# code is a direct port of the Scala code (I believe Google used a Core i7 processor for the benchmarks as indicated here: http://www.theregister.co.uk/2011/07/01/go_v_cpluplus_redux/).

Update: Google used an old Pentium for the benchmark study so the results are not comparable with the F# numbers. However read the comments for some additional information.

Also, I ran the same F# code on Windows 8 and .Net 4.5 (Core i5; HP Pavillion g7) and consistently get 11-12 second runs.

About these ads

5 thoughts on “Google’s Multi-Language Benchmark in F#

    • yes you are correct – they are not directly comparable.

      However what is comparable is the original C++ google code and the F# code running on the same machine. I am quoting “Ildjarn” (http://social.msdn.microsoft.com/Forums/en-US/fsharpgeneral/thread/9c1ecb68-11b0-4fd2-a7d7-23a9c062fd0e) who did an independent test. Here is what he said (the link is also in the blog post):

      I compiled and ran your F# code (release, .NET 4.0 SP1, x64) and Google’s reference C++ code (release, VC++ 2010 SP1, x64, WPO enabled) on the same machine. I get ~23.5 seconds for the reference C++ implementation and ~18.4 seconds for your F# implementation. Nice job. :-] (Unfortunately I don’t have a recent version of MinGW installed on this computer, so I can’t see how GCC fares in comparison to VC++.)

  1. Pingback: Performance Improvements From 2010 – 2013 (due to hardware and .Net improvements) | Faisal's space

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s