brainysmurf/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Writing a custom kernel for Jupyter

Introduction / movitation

Already, Jupyter notebook system is great for sharing course content, and does it in a way that I can constantly write and improve it. However, there was one drawback in the course that I'm offering (IB Diploma Computer Science) and it's that it teaches Pseudocode, and not any official language. That means that I can't just ask them to open the terminal, or use a text editor and execute.
However, like all Open Source tools, Jupyter is very customizable. So I set to work. This is what I wanted:

students can input pseudocode from scratch
student can play with and improve code provided to them
the resulting output from the execution (stdout) would be displayed

The solution I used was to write a Jupyter kernel that converts the pseudocode into Python, executes it and captures stdout. As I began the project and as I learned more about Jupyter, I realized I was able to get even more than I originally hoped:

sytnax higlighting
custom error handling
display output of the converted (transpiled) Python

Fortunately for me, I could just build on a framework already provided via Open Source technologies, called metakernel. It was clear after spending a few hours playing with that, that I could all of the above features.
metakernel

Someone had already thought about how to translate the dense Jupyter machinery for a custom kernel into a framework. I only needed to learn how to get it going. It turned out to be pretty straight-forward. All the developer needs to do is define a class that inherits from Metakernel and define some class variables on it.
Then, the developer defines the do_execute_direct method on the custom class, which returns a status update to the frontend. Since what is returned from that method is also output to the end user, I simply overrode the do_execute method to call its own super method with silent=True.
The do_execute_direct method reads the pseudocode provided, and uses regular expressions to identify suble differences between the source pseudocode and target Python.
The different Jupyters

Open Source can be confusing because the environment is easy to have multiple slightly different versions of the same thing. There is Jupyter Lab, and Jupyter Hub, and of the latter two or three more Jupyters.
I chose to stay with the "standard" Jupyter Notebook for a few reasons:


Jupyter Hub, which is a way to host Jupyter Notebooks onto private servers, looks great but is overkill for one class. Instead, I'll be sharing it with my students as a github project that they can keep up-to-date with by pulling


Jupyter Lab, which I actually would prefer, doesn't implement syntax highlighting via the kernel (you have to do it with an extension instead). I didn't think this project warranted having to learn how to do that as well.


Development environment

In my case, I forked metakernel itself (found here), and just added a package inside of that in order to build the custom kernel. Inside a virtual environment, all one has to do is ensure that my forked was importable, and my custom kernel was executable as a module. Using pip, this meant "installing" it into the virtual environment with setup develop, making development fairly painless.
Installing the kernel with metakernel itself is done with the command:
python -m <kernel_package_name> install
Then all I had to do was change code and restart the procoess and reload the browser notebook.
The %%transpile cell magic

A magic is just Jupyter's way of sending special commands to the kernel, and there are two kinds: cell magics and line magics. Since I wanted a magic that would then output the transpiled Python code, cell magic was what I wanted.
Since metakernel already has magic machinery built-in, it was fairly straight-forward. The only difficult part was getting the line numbers to be consistent. Since the cell magic removes the line with the magic, and also strips off any leading whitespace, when an error in the pseudocode was found, it was reporting the wrong line number. The fix was to have a special attribute on the kernel app object that provided for the offset.
Syntax highlighting

This proved to be the most difficult part of the project, as it required me to add a feature to metakernel and pull together documentation from various sources.
The feature added to metakernel (via this pull request) is to package up javascript code into the kernelspec directly. When a kernel is installed with the above command (with python -m <kernel_package_name> install), it packages the required json file but there's no mechanism for including .js file as is the standard. It's a simple addition.
Then I had to write the JavaScript in order to process the code that is input by the user into some sort of lexer for highlighting. I saw that the Jupyter frontend needed to have an AMD module, which follows the pattern:
define(
    ['path/to/import'],
    function(importedObject) {
        return {
            onload: function(){
                // code here
            }
        };
    }
);

But what took me a long time to figure out is CodeMirror, which is the codebase used to implement JavaScript widgets in the Jupyter browser. Since it's such a large codebase, you have to import functionality that you will be using, as well as importing the base object itself, otherwise the method below defineSimpleMode was unavailable and not attached to the CodeMirror object:
define(
    ['codemirror/lib/codemirror', 'codemirror/addon/mode/simple'],
    function(CodeMirror, _) {
        return {
            onload: function(){
                CodeMirror.defineSimpleMode('language_name', {
                    start: [
                        // defs here
                    ]
                });
            }
        };
    }
);

Then I just made sure that language_name exactly matched the language_info.name property defined in the MetaKernel child class, and Jupyter would start using my lexer.
However, I needed some way of knowing what were legal tag names when defining the lexer. My searches found that this Token Inspector for CodeMirror was the most useful.