takluyver/ipep2.rst

## ipep2.rst

      
    Raw
  

              ipep2.rst
            
          
    N.B. we decided to keep the document in the issue, so the up to date version is here

Created: 2012-08-12
Author: Thomas Kluyver

The state of our input transformation machinery has come up a couple of times recently, and I'd promised to look into it.
Requirements

A line-by-line input filter is needed for two main reasons: - We need to avoid transforming content inside multi-line strings. - Line-based frontends (the terminal and Qt console) decide whether another line is required based on attempting to compile the current buffer as Python code. So constructs like %magic commands, which aren't Python syntax, have to be intercepted as each line is entered.
We also need to do some transformations which are only possible with access to the interactive namespace, i.e. they must be done in the kernel. Examples include the autocall system (which lets you type exit to exit), macros, automagics (using magics without the % prefix) and aliases for shell commands (like ls). We refer to these as 'dynamic transformations'.
Finally, we need an extensible system that third parties can hook into without having to monkeypatch lots of our code.
Current situation

InputSplitter does line-by-line transformation (the name's a little confusing, as its primary role is no longer splitting input). It also handles cell magics, but the implementation feels somewhat awkward to me. For line-based frontends, inputsplitter is run twice: once by the frontend, and again by run_cell(), which is called with the raw, untransformed code.
Prefilter does dynamic transformations using a mixture of Transformer subclasses and Checker/Handler subclass pairs. We've struck the compromise that dynamic transforms only happen on single line cells, because the frontend can't make them valid syntax on its own. This is the primary extension point for third parties, but it's somewhat awkward to use (subclassing from Transformer isn't simple), and doesn't work as extension authors might expect (only transforms single lines).
Several bits of functionality are duplicated in inputsplitter and prefilter: the transformations for %magic, !system, assigning versions of both (foo = %magic), help? (and ?help, morehelp??), escapes for various kinds of call (/callme arg, ,quoteseparate a b c, ;quotetogether a b c), and stripping Python/IPython input prompts. As far as I know, we only use the inputsplitter versions of these functions, since Fernando fixed %paste to use inputsplitter.
Suggestions


I suggest that we make InputSplitter the main point of contact for extension authors to transform input. It works on multi-line cells, and knows to ignore text within strings, which almost all transformations will want to leave alone. This will involve developing InputSplitter to make it easier to extend (details to be fleshed out in discussion), and improving the documentation to point extension authors towards this rather than prefilter.
For all the duplicated functionality, strip it out of prefilter and rely on the inputsplitter versions.
Rename InputSplitter to something more meaningful, before many third parties are depending on it. InputAccumulator? InputFilter?
Allow inputsplitter transformers to maintain state between lines. This should allow a less special-cased system for catching cell magics, as well as correctly stripping the prompts in a pasted block like the following:
>>> a = """1
... 2
... 3"""
Add a later transformation hook which acts on the AST, rather than a string of the code. This would support cases like SymPy's intention to wrap integer literals (reference). The ast module already has a NodeTransformer class to support this kind of thing. This approach is limited to code that is already valid Python syntax before the transformation, but it should be powerful and reliable in those situations.