Trying out Jupyter Notebooks

I happen to be on the team at Microsoft which recently launched an in-preview Jupyter Notebook service in Azure ML Studio. Before this I had heard of Jupyter Notebooks (nee IPython Notebooks) but I had never bothered installing the requisite software to give them a shot. Perhaps it was knowing that scientists seemed to be the major users of Jupyter that led me to never bother, but laziness definitely also played into it. But with our notebook service removing the "installing is a pain" factor and having met both Brian Granger and Fernando Perez at PyData Seattle 2015 and getting to spend a bit of time with them I decided to dive into Jupyter with a project.

Being the author of the "Porting Python 2 code to Python 3" HOWTO means I'm constantly looking for new ways to present the work necessary to make Python 2 code work under Python 3 simultaneously (especially if it helps show how it doesn't require a ton of work to accomplish). Porting the HOWTO to a Jupyter Notebook seemed to present a unique opportunity to show Python 2 code next to Python 3 code to help demonstrate some of the subtler issues one can run into when porting (the stuff that's easy to do is already handled by various tools and thus I don't show since it isn't worth overwhelming people with unnecessary data... says the man writing this long aside). And since loading the notebook up means you have live code you can play with it seemed to provide a unique opportunity to bring the HOWTO "alive" and let people play with the examples to help them grasp the concepts being presented.

If you don't want to wait, the notebook is available on nbviewer and basically my thoughts are that notebooks are great. =)

The way I like to think of notebooks is that its literate programming with extras tossed for easy data exploration. Basically a notebook is made up of a sequence of cells The cell types can be Markdown, code, or raw text. The Markdown support is exactly what you would expect and is used to talk about whatever the notebook is about. Code cells are code that you can execute and have its output displayed in the notebook. Raw cells are usually to pass through text which is to be processed in a non-standard way (e.g., LaTeX that's to be rendered and included in some PDF output).

One of the interesting thing about Jupyter Notebooks is that they are independent of any programming language. While the code that makes notebooks work is all Python (the project came out of IPython after all), what code is supported in a code cell is determined by what kernel you are using. Basically for a programming language to be usable in a notebook just requires someone to write code that can support the API that Jupyter requires. Once that exists then Jupyter will launch the kernel you want and send commands over a socket to get the results of the code execution. This is how Jupyter is not just popular in the Python community but also amongst other language communities such as Julia. There is even direct support to make a code cell use Python 2 or 3 no matter what kernel you chose through the %%python2 and %%python3 magic commands.

I personally didn't use it in my porting notebook, but the other big sell for Jupyter when using Python is direct support for things such as matplotlib and pandas. This integration is where Jupyter's love among scientists comes in. The ability to easily explore data and have the results be displayed to you can be very helpful when exploring data. It also makes presenting results to others much easier than sending them a PDF when they are able to play with the data themselves.

And this ease of playing with notebooks and data ties into that notebook service that we launched at Microsoft's Azure ML Studio comes in. The ability to play with notebooks in your browser and have the kernels executing in the cloud is great since the friction of installing any software is gone. And then the ability to toss your data up into Azure Storage and then explore it in the notebook gets you the data end of the story so that you don't have to worry about a dataset being too much for your workstation to handle.

Basically I think notebooks are great in any instance when you have a decent amount of text to go with your code, and this is doubly true when data is involved.