Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Existing survey of text parsers / lexers in Go.

Lexer / Parser in Go

In Go, one of the cleanest examples of parsing text is the lexer design used in text/template. You can find the current source here.

https://github.com/golang/go/blob/c007ce824d/src/text/template/parse/lex.go

You can find the explination here:

Video: Lexical Scanning in Go - Rob Pike https://www.youtube.com/watch?v=HxaD_trXwRE

Slides: https://talks.golang.org/2011/lex.slide

This designed has been used by many others as a simple way to reason about state without writing a complex state machine.

In the years since it was first written (and the talk was given), the only major changes (besides small bug fixes and features) is the additional tracking of which line we are on, and the concept of starting/ending delimiters.

https://github.com/golang/go/blame/c007ce824d9a4fccb148f9204e04c23ed2984b71/src/text/template/parse/lex.go

Existing Work

https://github.com/bmatsuo/go-lexer/blob/master/lexer.go has a nice walkthrough of what is going on. While lacking unit tests, it seems to be the best API for a re-usable lexer.

Illustrated guide to how the cursors move: https://github.com/lestrrat-go/lex/blob/master/reader.go#L46

Simple example of a lexer that just reverses words: https://github.com/aarongreenlee/golang-lexer-example/blob/master/lexer.go

Simple lexer expecting you to provide stateFn to run: https://github.com/bbuck/go-lexer

Lexer designed for io.Reader stream and parsing of text tokens for NLP: https://github.com/chewxy/lingo/blob/master/lexer/lexer.go Furthermore, this lexer also keeps track of the line number and column of the found tokens. Originally used when scaning source code. Handles complex runs like URL's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.