Indentation sensitive parser combinators
An indentation sensitive parser combinator language is one that helps you express
ideas like "this parse only succeeds if it's within the current indentation block".
The concept is somewhat small and elegant and is implemented in a few libraries. In
this writeup, I will use examples from
Background and goal
The direct goal will be to write the
sameOrIndented parser combinator with a type
like (but not identical to)
sameOrIndented :: Parser s u () -- aspirational, not accurate
It fails whenever the current parse is at an indentation level less than a local
"reference" level which is set by
withPos :: Parser s u a -> Parser s u a -- aspirational, not accurate
The idea being that you wrap a parse with
withPos after parsing the indentation whitespace and
the "inner" parse will have its reference set to the location where
withPos was applied. It
makes little sense to call
sameOrIndented (or other combinators like it) outside of a
block but the types do not prohibit it.
Context for indentation
Indentation-sensitive parses must carry with them information about the current "reference"
indentation level. To do so, we use a monad transformer with a
State SourcePos layer.
Parsec type describing a location in the source text during parsing.
The function of
withPos is to set this reference context locally. Now that we have
to work with this is easy, we use the
Parsec monad to get the "current" location then temporarily
store it in the
State monad during the inner computation
withPos run = do here <- getPosition -- Parsec effect prev <- get -- State effect, stored to restore after `run`ning put here res <- run put prev return res
Operation within the context
Once we've established an indentation reference using
withPos, implementation of combinators
sameOrIndented is easy: we just get the "current position" and compare it with the "reference
position". To make this work with
sameOrIndented we need to determine if our column count is
equal-to-or-greater-than the reference one.
sameOrIndented = do here <- getPosition ref <- get if (sourceColumn here) >= (sourceColumn ref) then return () else parseFail "bad identation"
Use in practice
With these combinators and ones like them you can annotate your parsers with "indentation block
begins here" markers via
withPos. Parses within these blocks may need to respect indentation
and therefore contain options which include whitespace but are guarded by combinators like
sameOrIndented and its ilk.