Wednesday, September 03, 2008

Speeding up Haskell with C – a very short introduction

You are working on your project in Haskell. You notice the performance is not as great as it could be. Here’s what you can do, apart from writing more efficient Haskell code.


First, of course, you have to profile your code to know what’s really slow and not. Using GHC, compile your program with -prof -auto-all.

$ ghc -O2 --make Project.hs -prof -auto-all

Doesn’t work? Then you probably have to recompile the extra libraries you’re using (perhaps from Hackage). These libraries most certainly use the Cabal build system. Pass --enable-library-profiling to configure then build and install as usual.

$ cd /to/the/library/source
$ runghc Setup.hs configure --enable-library-profiling
$ runghc Setup.hs build
$ runghc Setup.hs install

Don’t worry, this will not mess up the current installation, it will just add a special profiling version of the library next to the regular version.

You can now test your code.

$ Project +RTS -p

The GHC profiler is competent and can profile both time and space. With -p, we’re profiling time.

The output is a file, It has lots of interesting stuff in it, most importantly a list with the functions that take the most time. In our fictional example, the profiler tells us the program spends most of its time in the function dctiv, the DCT-IV transform.

dctiv :: Int -> [Double] -> [Double]
dctiv points input = ...

This is probably a typical function you want to write in a low level language. It is used a lot, and is basically number crunching. Of course, a Haskell wizard can probably whip up a DCT-IV with performance very close to C. So we use the DCT-IV only because it's simple to implement, not because it is a domain C solves better than Haskell.

Let’s write C!

Most modern programming languages have facilities for invoking code written in other languages, and so does Haskell. But we worry about that later; let’s write a small C library first.

#ifndef DCTIV_H
#define DCTIV_H
void dctiv(int points, double *in, double *out);
void dctiv(int points, double *in, double *out)
    /* What this does is not really important, other than it doesn't
       do side effects, and writes the output to out. */
    int k, n;
    double sum;

    for (k = 0; k < points; k++) {
        sum = 0.0;
        for (n = 0; n < points; n++) 
            sum += in[n] * cos((PI/points) * (n + 0.5) * (k + 0.5));
        out[k] = sum;

Build an object file:

$ gcc -O2 -c dctiv.c

It’s probably a good idea to write a small test program and see if the little C library behaves as expected.

$ gcc -o testprogram test.c dctiv.o

Let’s write Haskell! With C!

To use our C code from Haskell we’re going to use the FFI – the Foreign Function Interface. This is an addendum to the Haskell 98 standard. Using FFI is quite pleasant, especially if we begin with pure Haskell code that we later port to C. This means we can design the C code with purity in mind.

To use FFI, we have to tell the compiler via a pragma at the top of the source code. We’ll also use a bunch of libraries with functions useful for calling foreign code. The top of the program will look like this:

> {-# LANGUAGE ForeignFunctionInterface #-}

> import Foreign.C.Types 
> import Foreign.Ptr 
> import Foreign.Marshal.Array 
> import System.IO.Unsafe 

Foreign.C.Types and Foreign.Ptr have functions for working with primitive C types and pointers. Foreign.Marshal.Array is for converting between Haskell types and C arrays. System.IO.Unsafe is explained further below.

Next we’re going to import a function from the compiled C library. We can’t use the regular import keyword, instead we use foreign import, and tell Haskell which library we’re interested in, the function we want to use, and what the function looks like:

> foreign import ccall "dctiv.h dctiv"
>     c_dctiv :: CInt -> 
>                Ptr CDouble -> 
>                Ptr CDouble -> 
>                IO ()

Syntax aside the only weird stuff here is the last return value. Our C function returns void, so using () makes sense. But what about the IO? The code will explicitly change regions of memory, and passing two identical pointers may not mean the contents of the pointers are the same.

The function c_dctiv is low level, and it doesn’t look like the pure dctiv function we’re replacing at all! That’s what we’re going to fix now.

The purpose of the Foreign.Marshal.Array library is to convert between Haskell types and C arrays. We’re going to use three functions: withArray takes a Haskell list and writes it to memory, so it can be passed to C functions. allocaArray temporarily allocates memory our C-code can write to. peekArray takes some memory and returns a Haskell list. Here’s an intermediate function that calls c_dctiv for us.

> dctivIO :: Int -> [CDouble] -> IO [CDouble] -- 1
> dctivIO points input -- 2
>     = withArray   input  $ \cinput -> -- 3
>       allocaArray points $ \coutput -> -- 4
>       do c_dctiv (fromIntegral points) cinput coutput -- 5
>          peekArray points coutput -- 6

This function is short, and you can intuitively guess what it does by looking at what’s passed to the C function. The Marshal.Array functions have funky types though, and the code looks slightly arcane. Lets look at the type of withArray as an example to see what's going on:

withArray :: Storable a => [a] -> (Ptr a -> IO b) -> IO b

The Storable type class is meant for types that can be written to memory. All C types are instances of Storable, which makes sense. withArray takes a Haskell list [a], writes it into memory, and then passes the pointer to this memory to a function which is executed. allocaArray too does something with memory and passes it to another function, and these functions are chained together until we're done with the foreign code.

In English, what dctivIO does is the following:

At line 3, withArray takes the CDouble list as input and writes it to memory. It passes the memory location to a new function, that extends all the way down line 6. In this function, the memory location is bound to cinput. From now on we can treat cinput as a double*.

At line 4, allocaArray allocates an array of points elements, which will hold the output. The newly allocated array is passed to yet another function, where it gets bound to coutput when executed. allocaArray will free the memory when it’s done.

The last two lines form an IO block. At line 5 we run our C function c_dctiv, not forgetting to convert our Int to a CInt. The output of c_dctiv will be written to coutput. The last line of the function simply returns a Haskell list (in IO) from the array coutput of length points.

dctivIO is a much more useful and Haskell-like function than the raw c_dctiv, but it’s still not good enough. We want something of type Int -> [Double] -> [Double], not this hack with C-types and IO.

Despite the detail that memory is allocated within withArray, the C function dctiv is really pure. It doesn’t write to files, change state or anything; it does the exact same thing like our pure Haskell version. withArray et al doesn’t know this though, and that’s why we had to work in IO.

As we know dctiv doesn’t really have any side effects, we can purify it using unsafePerformIO from System.IO.Unsafe, which has the magical type IO a -> a. At the same time we add some code to convert between CDouble and Double.

> dctiv :: Int -> [Double] -> [Double]
> dctiv points input = 
>     let cinput  = map realToFrac input
>         coutput = unsafePerformIO (dctivIO points cinput)
>         output  = map realToFrac coutput
>     in output

And we’re done. As the Marshal.Array-functions have to consume the input lists when they are written to memory, converting from the two Double types do no significant harm.

To compile this code, GHC has to know about the C library.

$ ghc -O2 dctiv.o --make  Project.hs

You can even test your code in GHCI, like this:

$ ghci Project.hs dctiv.o


The speed improvement of writing parts of a program in a low level language will of course depend on the task, and how good you are at writing efficient Haskell (and C). For some applications the speed up can be substantial, and learning the basics of FFI can be a good investment.

For more information about FFI and Profiling, here are some more in-depth resources:

The excellent Real World Haskell has chapters on both profiling and the FFI.
HaskellWiki has several articles on FFI and profiling.
There's also the venerable documentation.