Skip to content

Instantly share code, notes, and snippets.

@xeoncross
Created October 22, 2019 18:43
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save xeoncross/11878f06b540feb1987621ecc870c810 to your computer and use it in GitHub Desktop.
Save xeoncross/11878f06b540feb1987621ecc870c810 to your computer and use it in GitHub Desktop.
Existing survey of text parsers / lexers in Go.

Lexer / Parser in Go

In Go, one of the cleanest examples of parsing text is the lexer design used in text/template. You can find the current source here.

https://github.com/golang/go/blob/c007ce824d/src/text/template/parse/lex.go

You can find the explination here:

Video: Lexical Scanning in Go - Rob Pike https://www.youtube.com/watch?v=HxaD_trXwRE

Slides: https://talks.golang.org/2011/lex.slide

This designed has been used by many others as a simple way to reason about state without writing a complex state machine.

In the years since it was first written (and the talk was given), the only major changes (besides small bug fixes and features) is the additional tracking of which line we are on, and the concept of starting/ending delimiters.

https://github.com/golang/go/blame/c007ce824d9a4fccb148f9204e04c23ed2984b71/src/text/template/parse/lex.go

Existing Work

https://github.com/bmatsuo/go-lexer/blob/master/lexer.go has a nice walkthrough of what is going on. While lacking unit tests, it seems to be the best API for a re-usable lexer.

Illustrated guide to how the cursors move: https://github.com/lestrrat-go/lex/blob/master/reader.go#L46

Simple example of a lexer that just reverses words: https://github.com/aarongreenlee/golang-lexer-example/blob/master/lexer.go

Simple lexer expecting you to provide stateFn to run: https://github.com/bbuck/go-lexer

Lexer designed for io.Reader stream and parsing of text tokens for NLP: https://github.com/chewxy/lingo/blob/master/lexer/lexer.go Furthermore, this lexer also keeps track of the line number and column of the found tokens. Originally used when scaning source code. Handles complex runs like URL's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment