Network protocols, sans I/O
Back in February I started taking a serious look at asynchronous I/O thanks to async
/await
. One of the things that led to me to looking into this area was when I couldn't find an HTTP/1.1 library that worked with async
/await
. A little surprised by this, I went looking for an HTTP header parser so that I could do the asynchronous I/O myself and then rely on the HTTP parsing library to at least handle the HTTP parts. But that's when I got even more shocked to find out there wasn't any such thing as an HTTP parsing library in Python!
It turns out that historically people have written libraries dealing with network protocols with the I/O parts baked in. While this has been fine up until now thanks to all I/O in Python being done in a synchronous fashion, this is going to be a problem going forward thanks to async
/await
and the move towards asynchronous I/O. Basically what this means is that network protocol libraries will need to be rewritten so that they can be used by both synchronous and asynchronous I/O.
If we're going to start rewriting network protocol libraries, then we might as well do it right from the beginning. This means making sure the library will work with any sort of I/O. This doesn't mean simply abstracting out the I/O so that you can plug in I/O code that can conform to your abstraction. No, to work with any sort of I/O the network protocol library needs to operate sans I/O; working directly off of the bytes or text coming off the network is the most flexible. This allows the user of the protocol library to drive the I/O in the way they deem fit instead of how the protocol library thinks it should be done. This provides the ultimate flexibility in terms of how I/O can be used with a network protocol library.
Luckily I wasn't the first to notice the lack of HTTP parsing library. Cory Benfield also noticed this and then did something about it. He created the hyper-h2 project to provide a network protocol library for HTTP/2 that does no I/O of its own. Instead, you feed hyper-h2 bytes off the network and it tells you -- through a state machine -- what needs to happen. This flexibility means that hyper-h2 has examples on how to use the library with curio, asyncio, eventlet, and Twisted (and now there's experimental support in Twisted for HTTP/2 using hyper-h2). Cory also gave a talk at PyCon US 2016 on the very topic of this blog post.
And HTTP/2 isn't the only protocol that has an implementation with no I/O. Nathaniel Smith of NumPy has created h11 which does for HTTP/1.1 what hyper-h2 does for HTTP/2. Once again, h11 does no I/O on its own and instead gets fed bytes which in turn drives a state machine to tell the user what to do.
So why am I writing this blog post? I think it's important to promote this approach to implementing network protocols, to the point that I have created a page at https://sans-io.readthedocs.io/ to act as a reference of libraries that have followed the approach I've outlined here. If you're aware of a network protocol library that performs no I/O (remember this excludes libraries that abstract out I/O), then please send a pull request to the GitHub project to have it added to the list. And if you happen to know a network protocol well, then please consider implementing a library that follows this approach of using no I/O so the community can benefit.