marijnh/highlighting.md

## highlighting.md

      
    Raw
  

              highlighting.md
            
          
    (This is a response to https://github.com/google/xi-editor/blob/master/doc/rope_science/rope_science_11.md that didn't fit in a tweet.)
CodeMirror uses a largely similar approach, with a somewhat different framing.
Firstly, it stores the mode (highlighter) state directly in the document data structure (which I probably wouldn't do again if I were to redesign this), not in a separate cache. Each line can have an optional 'highlight state after this line' field. During highlighting, a cached state is left every N lines.
The frontier is simply the line number up to which point highlighting has happened. When you edit above the frontier, it it moved back to the line before the change.
Highlighting never proceeds past the end of the viewport. So startup is cheap (only highlight the first screenful), and in the 99% case of changes happening inside the viewport, re-highlighting complexity is bounded by the size of the viewport, since only the area between the change and the end of the viewport needs to be processed.
For changed lines or lines drawn for the first time, highlighting happens synchronously, when drawing. A pseudo-background process (a function scheduled to run in moments of inactivity, which does work in limited time slices) will advance the frontier when it is above the viewport bottom, and redraw lines whose syntax highlighting changed (but only if it changed).
When starting to highlight at a given point, first scan backwards to the next cached state. If that state is further than M lines away, give up and start highlighting with an initial state. In that case, the frontier is not moved forward by highlighting, so that the background process will eventually catch up and highlight the region with a proper state.
One issue that we occasionally run into is looking ahead past line boundaries. This is currently simply verboten, to make caching states simple, and modes mostly manage to work well without it, but for some things, it'd really be helpful. To support that, we'd need to add information to cached states about the amount of lookahead they depended on, and invalidate states when the frontier touches these lines.
(Coincidentally, I'm currently working on a more mode-writer friendly notation for modes based on grammars, since I'm finding the classical regexp-state-machine approach is way too limiting, but writing a mode directly as a program threads its state through appears to be quite challenging for many (JavaScript) developers.)