The concept of coroutines, typically attributed to Donald Knuth, has existed for decades so why do so few mainstream programming languages properly implement it?

Since the concept is not as well-known as it should be, let me start by refreshing your memory of what a coroutine is. Basically, it’s a piece of code which has multiple entry/exit points, as opposed to a subroutine which can have multiple exit points but never has more than one entry point. The local state of a subroutine is lost when it exits; a coroutine keeps its state and will continue where it left off when you call it a second time. Another way to describe it is that with subroutines, the called code needs to finish completely before returning to the caller, whereas with coroutines you have two pieces of code running at the same level of abstraction, passing control back and forth between them.

For a simple example, take a look at the following code:

public static IEnumerable Squares(int count)
    for (int i = 0; i < count; ++i)
        yield return i * i;

This is an example of how a collection-like object can be written in C# 2.0. Whenever the ‘yield return’ statement is encountered, the next value from the for loop will be returned to the calling code, which might look like this:

foreach (int y in Squares(20))
    Console.Write("{0} ", y);

Unfortunately, the C# designers did not go all the way in implementing real coroutines. Instead, the ‘yield’ keyword only works for the specific case of a method which returns IEnumerable.

So, why do I want them, besides the fact that it’s just generally a cool concept? Here’s why. In our current application, written in Java 5, we need to make a deep copy of a rather complex data structure. Because of various issues related to object identity rules, the precise semantics for this operation are rather hairy. Fortunately, we have solved all those issues already, in our XML import/export code. So, the obvious solution would be to connect the output stream of the XML exporter to the input stream of the XML importer, right? There’s a bit of a performance hit, of course, but we were quite willing to accept that.

But that turns out to be a lot more difficult than it would seem at first. What you need is a PipedInputStream/PipedOutputStream pair, which seems like it should do what you want, except that as soon as you run the code you get a deadlock: the import code is sitting at an empty buffer waiting for the next batch of data, which will never arrive because the export code, which it is waiting for, will never run. So you need to run either the export code or the import code on a separate thread, which leads to all kinds of ugly hacks regarding exception handling and properly closing all streams after you’re done with them. Ugh. It works by now, but it’s not exactly the piece of code we’re most proud of.

The problem with threads, in this case, is that they are too powerful. Once you have multiple threads running, they can interrupt each other at unpredictable moments, so you need to manually synchronise things, beware of deadlocks, etc. Also, because they are so generic and powerful, starting/stopping and switching between threads are relatively expensive operations. In our case, we know exactly when we want to do the switch: the import code should run as long as there’s data available in the buffer, and then it should yield to the export code which should run until the buffer is full, at which point it should yield back to the importer again. In other words, a textbook case for coroutines. Only we don’t have them in Java.

Well.. Not really, at least. Some impressive (in the “oh my God I can’t believe you did that” sense) attempts have been made to fake it anyway. For example, the Xalan implementation contains a Coroutine manager which implements logical coroutines on top of physical threads. That way you don’t have the efficiency advantages of coroutines over threads, but you do retain most of the elegant programming model. But a proper implementation really should be implemented inside the VM and operate below the abstraction level of threads, not above it.

Many platforms do offer a feature that looks very much like coroutines: co-operative threads, a.k.a. fibers. Those are logical threads where the code being executed has to explicitly yield (hey!) control to the next thread, typically by calling a system function called “yield” (hey!!). When the fiber implementation you are using is deterministic enough, you could build a coroutine implementation on top of it, and in fact somebody did just that by combining the Win32 Fiber API with the .NET framework. Note the editor’s warning at the top of the article, though: this is not the kind of thing you want to stick into your production code. And his code also does not implement generic coroutines, but only the special case of generator-based collections — only he did it in .NET 1.1, which I have to admit is a pretty cool hack.

Oh well, it looks like I’m going to need to solve my problems the ugly way for now. If you want true coroutines, you’ll need to look into some of the less mainstream languages. Python has had a generator-style iterator mechanism for a while now, pretty much identical to the C# example given above, and Python 1.5 (out in alpha release as I write this) has “bidirectional” generators which come very, very close to being true generic coroutines. Ruby also claims to support coroutines, but as far as I can tell, it only does generators, too, and not the full general case. Most Lisp dialects have the real thing, though, as do some functional languages. Hmm, maybe there’s something to Paul Graham‘s Blub Paradox rant after all?

Comments are closed.