Stormin’ F#

Apache Storm is a scalable ‘stream computing’ platform that is fast gaining popularity. Hadoop and Storm can share the same cluster and the two complement each other well for different computing needs – batch for Hadoop and near-real-time for Storm.

Storm provides a macro architecture for executing ‘big data’ stream processing ‘topologies’. For example, one easily increase the parallelism of any node in the Storm topology to suit the performance requirements.

For streaming analytics, however, Storm does not offer much help out of the box. Often one has to write the needed analytic logic from scratch. Wouldn’t it be nice if one could use something like Reactive Extensions (Rx) within Storm components?

Luckily Nathan Marz – the original author of Storm – chose to enable Storm with multi-language support. While Storm itself is written in Clojure and Java, it implements a (relatively simple?) language-independent protocol that can be used with basically any mainstream language.

FsStorm is an attempt to allow F# (and .Net Rx) to be used for defining and running Storm topologies.

Storm and Rx are big topics in and of themselves. If your are interesting in leveraging FsStorm, it would be best to first understand Storm and Rx using the official and supplemental documentation (blog posts, videos, etc.). It took me a while to get a decent enough grasp of Storm to start putting together FsStorm. Be sure to view one of Nathan’s videos on Storm.

Given sufficient Storm knowledge, take a look at the sample project FsSimpleTest. It includes a topology a spout and a bolt. The topology is described with StormDSL (which IMHO makes defining topologies much simpler than in Java).

In order to run FsSimpleTest first setup a Storm cluster under Windows (for now). Download the repo and compile FsSimpleTest. To submit the topology to Storm, use the Submit.fsx script, included in the project but before you run the script make sure the ‘binDir’ variable is set correctly for your environment. Also the “jar” command – which comes with Java JDK – should be in the path (it is used to package the built components into a Jar file which is uploaded to the Storm cluster).

You should be able to see the running topology in Storm UI – the browser-based console.

While the repo does not include a sample that uses Rx (yet) I have been successfully running a 3-component Storm topology that leverages Fsharp.Control.Reactive (the Rx wrapper for F#).

FsStorm does not utilize all of the capabilities offered by Storm. In particular Distributed RPC and Transactional topologies are not supported.

This is a very early but promising start for stream computing with F#. Contributors are welcomed!

Advertisements