Kenneth Ballenegger

Angel Investor, Engineer, Startup Founder

Fibonacci & the Y-Combinator in C

Be warned, this post describes something you should never actually use in production code. However, we will get to play with some very cool concepts and techniques: functional programming in C, closures, implementing autorelease pools from scratch, data structures (linked lists and b-trees), bitwise operations, recursivity, memoization, and the Y-Combinator. If this sounds crazy, don’t be scared. It’s all fairly simple when broken down.

Let’s back up for a second, however. What we’re going to create here is a program that returns a number in the Fibonacci sequence. The Fibonacci sequence is a sequence of numbers in which each integer is equal to the previous two integers added: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, etc.

There are many ways of calculating Fibonacci numbers, but the naïve implementation which follows the mathematical definition is this:

unsigned long long fib(int n) {
    if (n < 3) return 1;
    return fib(n-1) + fib(n-2);
}

So let’s make a program out of this:

// fib.c

#import <stdlib.h>
#import <stdio.h>

// let's make life easier for ourselves:
typedef unsigned long long ull;

ull fib(int n) {
    if (n < 3) return 1;
    return fib(n-1) + fib(n-2);
}

int main(int argc, char **argv) {
    int upto = 20; // which number to compute
    if (argc >= 2) upto = strtol(argv[1], NULL, 10);
    printf("Calculating fib(%d)...\n", upto);
    ull solution = fib(upto);
    printf("fib(%d) == %llu\n", upto, solution);
}

We’re going to be using clang as our compiler. Let’s compile and run our program:

$ clang -o fib -O3 fib.c
$ ./fib 17
Calculating fib(17)...
fib(17) == 1597

However, as you’ll see if you try to compute fib(200), this will not scale. In computer science terms, fib is a function which requires exponential time as n increases. O(2^n). For example:

fib(5) [A] calls fib(4) [B] and fib(3) [C]
fib(4) [B] calls fib(3) [D] and fib(2) [E]
fib(3) [C] calls fib(2) [F] and fib(1) [G]
fib(3) [D] calls fib(2) [H] and fib(1) [I]
fib(2) [E] returns 1
fib(2) [F] returns 1
fib(1) [G] returns 1
fib(2) [H] returns 1
fib(1) [I] returns 1

As you can see, the function is called multiple times with the same argument, and for every time n is decremented (until you reach 1 or 2), the number of function calls increases by a power of two. Imagine how much worse it would be for fib(200). The number is so great that even given unlimited memory and energy, your computer would require billions of years to finish the computation.

Closures

A closure is an anonymous function which may use and capture variables from its parent scope. Imagine this code, in JavaScript:

function print_element_plus(x) {
    return function(e) {
        console.log("-> " + (x + e));
    }
}
[1, 2, 3].forEach(print_element_plus(2));
// -> 3
// -> 4
// -> 5

print_element_plus returns an anonymous function (aka. a closure) which takes one argument and adds it to x. The variable x is captured by the closure, and even though it goes out of scope when print_element_plus returns, it is still retained by the closure until the closure itself goes out of scope and is freed.

C does not natively support closures. Although similar functionality can be implemented in pure C fairly easily using a struct containing the environment and a function pointer, we’re going to instead take advantage of clang’s built-in support for the blocks extension to the language:

$ clang -fblocks -o fib -O3 fib.c -lBlocksRuntime

In C, a block is simply another name for a closure, and its syntax is very similar to defining a function pointer. So with that in mind, let’s rewrite our fib function as a block.

__block ull(^fib)(int) = ^ull(int n) {
    if (n < 3) return 1;
    return fib(n-1) + fib(n-2);
};

Note: this should go in main() and the __block is necessary to enable explicit recursion, so that the block may reference itself.

The Y-Combinator

Recursion implies that a function has a name by which to refer to itself, so it may call itself. While the example above works, it relies on the block capturing the variable to which it is assigned from the parent scope. This is not super clean, and relies on a implementation detail of the C programming language. Identical code would not work in, for example, lisp.

Since we are giving ourself the restriction of not explicitly using the fib variable in order to recur, we will wrap the function in a function whose first and only argument is a function with which to recur. We’ll call this function the generator, because when called with fib as its argument, it will return the correct fib function.

typedef ull(^fib_f)(int);
fib_f(^fib_generator)(fib_f) = ^fib_f(fib_f recur) {
    return ^ull(int n) {
        if (n < 3) return 1;
        return recur(n-1) + recur(n-2);
    };
};

This is where the Y-Combinator comes in handy. The Y-Combinator is a function which enables recursion, using only anonymous functions, and without using recursion in its implementation. It looks somewhat like this (in Scheme):

(define Y 
  (lambda (f)
    ((lambda (x) (x x))
     (lambda (x) (f (lambda (y) ((x x) y)))))))

This article by Mike Vanier does a far better job of explaining the Y-Combinator than I ever could. Suffice to say that when called with a generator function, as defined above, the Y-Combinator will return the recursive fib function. With the Y-Combinator, we could say:

fib_f fib = Y(fib_generator);

Due to C’s explicit typing, declaring higher-order functions can quickly become cumbersome, even with typedefs. So in order to get around that, we’re going to declare a single type to hold every function in our program. We’re going to call it _l, short for lambda.

typedef union _l_t *_l;
typedef union _l_t {
    ull(^function_f)(int);
    _l(^generator_f)(_l);
} _l_t;

Wrapping the type in an union allows it to refer to itself. We’ll be adding a couple more block types to this union shortly, but for now our only two options are: the signature of the fib function, and the generator which both takes and returns a lambda (containing a fib function, though that is unspecified).

Since this type is declared as a pointer, it should live on the heap, and therefore we should write initializer functions for it:

_l function(ull(^f)(int)) {
    _l data = malloc(sizeof(_l_t));
    data->function_f = Block_copy(f);
    return data;
}
_l generator(_l(^f)(_l)) { /* ... same thing ... */ }

Let’s ignore the fact that we’re leaking tons of memory for a moment (we’ll come back to that), and instead refactor our fib generator to use this new type:

_l fib_generator = generator(^_l(_l recur) {
    return function(^ull(int n) {
        if (n <= 2) return 1;
        return recur->function_f(n-1) + recur->function_f(n-2);
    });
});

In C, the basic Y-combinator above looks somewhat like this:

_l y_combine(_l fn) {
    _l x = combiner(^_l(_l recur) {
        return function(^ull(int n) {
            _l resp = recur->combiner_f(recur);
            return fn->generator_f(resp)->function_f(n);
        });
    });
    return x->combiner_f(x);
}

You will also need to add the combiner function type to your union, and create a constructor for it:

_l(^combiner_f)(_l);

_l combiner(_l(^f)(_l)) { /* ... same as above ... */ }

We now have all the pieces in place to use the Y-combinator to replace our natively recursive implementation of fib:

_l fibonacci = y_combine(fib_generator);
int upto = 20;
if (argc >= 2) upto = strtol(argv[1], NULL, 10);
ull fib = fibonacci->function_f(upto);
printf("Fibonacci number %d is %llu.\n", upto, fib);

Auto-release pools

As you have probably noticed, the code above is unfortunately leaking tons of memory. Many of the functions we’ve written allocate _l unions, and these are never released. While we could attempt to always correctly release these as soon as they are no longer needed, a more interesting solution is to use an auto-release pool.

Auto-release pools are a concept familiar to Objective-C programmers, since they are used extensively in all of Apple’s APIs. They work like this: You can create and flush pools, which nest like a stack. You can auto-release a pointer, which means that it is added to the topmost pool, and is freed when the pool is flushed.

The simplest way of implementing auto-release pools is with two linked lists: the first to contain the objects that have been auto-released (this is the pool itself), and the second to contain the stack of pools. Lastly we’re going to need a static variable to refer to the top of the pool stack.

typedef struct autorelease_pool_t *autorelease_pool;
typedef struct autorelease_pool_t {
    void *p;
    void(*fn)(void*);
    autorelease_pool next;
} autorelease_pool_t;

typedef struct autorelease_pool_stack_t     *autorelease_pool_stack;
typedef struct autorelease_pool_stack_t {
    autorelease_pool head;
    autorelease_pool_stack parent;
} autorelease_pool_stack_t;

static autorelease_pool_stack current_pool = NULL;

Creating a pool is easy, we just allocate a reference and push it to the top of the stack:

void push_autorelease_pool() {
    autorelease_pool_stack new_pool = malloc(sizeof(autorelease_pool_stack_t));
    new_pool->parent = current_pool;
    new_pool->head = NULL;
    current_pool = new_pool;
}

Then we can write a function to add pointers to the pool. Since different types may have different free functions, we will also take a reference to the free function to use on the pointer as the second argument to the autorelease function:

void autorelease(void *p, void(*fn)(void*)) {
    // If there is no current pool, we leak memory.
    if (current_pool == NULL) return;

    autorelease_pool head = malloc(sizeof(autorelease_pool_t));
    head->p = p;
    head->fn = fn;
    head->next = current_pool->head;
    current_pool->head = head;
}

And finally, the magic happens when we flush the pool. We’ll simply loop through the current pool, call each element’s free function, and free the reference, and finally pop the pool off the stack:

void flush_autorelease_pool() {
    while (current_pool->head != NULL) {
        (*current_pool->head->fn)(current_pool->head->p);
        autorelease_pool next = current_pool->head->next;
        free(current_pool->head);
        current_pool->head = next;
    }
    autorelease_pool_stack parent = current_pool->parent;
    free(current_pool);
    current_pool = parent;
}

Now, in order to solve our memory leak issues in the code we wrote earlier, we must add code to auto-release the _l unions we allocate, and wrap main with an auto-release pool:

_l function(ull(^f)(int)) {
    _l data = malloc(sizeof(_l_t));
    data->function_f = Block_copy(f);

    // these two lines have been added:
    autorelease(data->function_f, (void(*)(void*))&_Block_release);
    autorelease(data, &free);

    return data;
}

Since both y-combination and final execution allocate lambdas, we’ll want to wrap both main, and then the final execution itself in an auto-release pool:

int main(int argc, char **argv) {
    // wrap in pool
    push_autorelease_pool();

    // ...

    _l fibonacci = y_combine(fib_generator);
    // ...

    push_autorelease_pool();
    ull fib = fibonacci->function_f(upto);
    flush_autorelease_pool();

    printf("Fibonacci number %d is %llu.\n", upto, fib);
    flush_autorelease_pool();
    return 0;
}

Memoization: Wrapping

While our code is now fully functional (ha, ha, coder pun…), it still is horribly inefficient. That’s because we still have not solved the inherent problem of the function having a time complexity of O(2^n). This can be solved using memoization.

Memoization is an optimization technique which consists of storing computations in memory when completed, so that they can be retrieved later on, instead of having to be re-computed. For fib, using memoization reduces the time complexity down to O(n).

In order to use memoization, we need a way to inject a function that will be executed before executing the fib function. We’ll call this wrapping, and as a first example, let’s just use wrapping to demonstrate how bad O(2^n) really is.

_l wrap(_l fn, _l wrap_with) {
    return generator(^_l(_l recur) {
        return function(^ull(int n) {

            _l fn_ = function(^ull(int n) {
                return fn->generator_f(recur)->function_f(n);
            });
            _l wrapped = function(^ull(int n) {
                return wrap_with->wrapper_f(fn_, n);
            });

            return wrapped->function_f(n);
        });
    });
}

Understanding this wrapping function still makes my brain hurt, but suffice to say that it takes a generator and a wrapper as arguments, and returns a generator. The resulting generator can be used in the Y-combinator, and every recursion of the original function will be replaced with a recursion of the wrapper function, which itself calls the original function.

A simple wrapper function that will log every iteration looks like this:

_l log_wrapper = wrapper(^ull(_l f, int n) {
    printf("About to generate # %d\n", n);
    return f->function_f(n);
});

And the final code that uses this looks like this:

_l wrapped_fibonacci = y_combine(wrap(fib_generator, log_wrapper));
ull fib = wrapped_fibonacci->function_f(upto);
printf("Fibonacci number %d is %llu.\n", upto, fib);

Your output should look somewhat like this. As you can see, calling fib(20) evaluates the function 13,529 times.

Memoization: Implementation

In order to write a memoization wrapper function, we need a data structure in which to store the results. Most examples of memoization use a hash table, but since the key in our case is an integer, I decided to go for something a little more exotic: a binary tree, based on the bits of the integer key.

typedef struct cache_node_t *cache_node;
typedef struct cache_node_t {
    bool is_leaf;
    union {
        struct {
            cache_node one;
            cache_node zero;
        } node;
        ull leaf;
    } self;
} cache_node_t;

We’ll also create some simple methods to create and destroy trees:

cache_node cache_tree_new() {
    return calloc(1, sizeof(cache_node_t));
}
void cache_tree_free(cache_node node) {
    if (!node->is_leaf) {
        if (node->self.node.one != NULL) cache_tree_free(node->self.node.one);
        if (node->self.node.zero != NULL) cache_tree_free(node->self.node.zero);
    }
    free(node);
}

In order to store a value in the tree, we iterate through every bit in the int, traversing either through the one or zero node of the tree, and finally setting the leaf to the value we’re trying to cache:

void cache_tree_store(cache_node root, int key, ull value) {
    cache_node node = root;
    for (size_t i = 0; i < sizeof(int) * 8; i++) {
        bool direction = (bool)((key >> i) & (unsigned int)1);
        if (direction == 1) {
            if (node->self.node.one == NULL) {
                node->self.node.one = calloc(1, sizeof(cache_node_t));
            }
            node = node->self.node.one;
        } else {
            if (node->self.node.zero == NULL) {
                node->self.node.zero = calloc(1, sizeof(cache_node_t));
            }
            node = node->self.node.zero;
        }
    }
    // the last node should already be a leaf, if it isn't, we just created it
    if (!node->is_leaf) {
        node->is_leaf = true;
    }
    node->self.leaf = value;
}

Looking up a value is essentially the same thing, with some additional code to report failures:

ull cache_tree_lookup(cache_node root, int key, bool *found) {
    cache_node node = root;
    for (size_t i = 0; i < sizeof(int) * 8; i++) {
        if (node == NULL || node->is_leaf) {
            if (found) *found = false;
            return 0;
        }
        bool direction = (bool)((key >> i) & (unsigned int)1);
        if (direction == 1) {
            node = node->self.node.one;
        } else {
            node = node->self.node.zero;
        }
    }
    if (node == NULL || !node->is_leaf) {
        if (found) *found = false;
        return 0;
    }
    if (found) *found = true;
    return node->self.leaf;
}

And finally, now that we have a tree in which to store cached results, we can write our memoization function. Here, we’re actually adding creating a function called new_memoizer which returns a new instance of the wrapper function with its own (auto-released) cache.

_l(^new_memoizer)() = ^_l {
    cache_node root = cache_tree_new();
    autorelease(root, (void(*)(void*))&cache_tree_free);

    return wrapper(^ull(_l f, int n) {
        bool found;
        ull cached = cache_tree_lookup(root, n, &found);
        if (found == true) {
            return cached;
        } else {
            ull value = f->function_f(n);
            cache_tree_store(root, n, value);
            return value;
        }
    });
};

So, with that, let’s try our program again, but with memoization:

_l fibonacci = y_combine(wrap(wrap(fib_generator, log_wrapper), new_memoizer()));

push_autorelease_pool();
ull fib = fibonacci->function_f(upto);
flush_autorelease_pool();

printf("Fibonacci number %d is %llu.\n", upto, fib);

As you can see in the output, this runs significantly faster:

Generate up to fib # 20...
About to generate # 20
About to generate # 19
About to generate # 18
About to generate # 17
About to generate # 16
About to generate # 15
About to generate # 14
About to generate # 13
About to generate # 12
About to generate # 11
About to generate # 10
About to generate # 9
About to generate # 8
About to generate # 7
About to generate # 6
About to generate # 5
About to generate # 4
About to generate # 3
About to generate # 2
About to generate # 1
Fibonacci number 20 is 6765.

Conclusion

You can peruse the final code used in this article in this gist.

Now, that being said, if you ever have a need to implement a function to return a number in the Fibonacci sequence, it would be wise to forget everything I’ve said above, and just use the following:

ull fib(int n) {
    if (n <= 2) return 1;
    ull first = 1, second = 1, tmp;
    while (--n>1) {
        tmp = first + second;
        first = second;
        second = tmp;
    }
    return second;
}