Skip to content

Instantly share code, notes, and snippets.

@MichaelBaker
Last active August 4, 2017 19:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save MichaelBaker/c07b2f3416caa4bd0ba9086f05bfc580 to your computer and use it in GitHub Desktop.
Save MichaelBaker/c07b2f3416caa4bd0ba9086f05bfc580 to your computer and use it in GitHub Desktop.

What

A technique for writing parsers.

Why

  • Easy to understand
  • Generally applicable
  • Full power of the programming language at your disposal
  • Declarative

The idea

Parsing is the act of translating a string representation of a data structure into the data structure itself.

The string "[1, 2, 3]" gets parsed into the list [1, 2, 3]

The simplest parser then is a function which takes a string as input and returns the data structure as ouput.

parser :: String -> a

Here's an ES6 parser for a single digit.

const parseDigit = (input) => {
  const char = input[0] 

  if (char === "0") {
    return 0
  } else if (char ===  "1") {
    return 1
  }
  ...
}

parseDigit("123") => 1

The problem with this parser is that we can't keep going after the first character because we don't know how much of the input was consumed. The solution is to have every parser return the unconsumed part of the input along with the parsed value.

parser :: String -> (a, String)

Here's the digit parser using the new scheme:

const parseDigit = (input) => {
  const char = input[0] 
  const remaining = input.slice(1)

  if (char === "0") {
    return [0, remaining]
  } else if (char ===  "1") {
    return [1, remaining]
  }
  ...
  } else {
    return null
  }
}

parseDigit("123") => [1, "23"]

If the input doesn't start with a digit, null is returned, indicating that the parse failed.

Now we'd like to be able to build more complex parsers out of more primitive ones. This is called a parser combinator. It's a function which takes a parser as input and produces a parser as output.

Here's a parser combinator that will run a parser over and over until it fails.

const many = (parser) => {
  return (input) => {
    const results = []
    let result = null
    let remaining = input

    for(;;) {
      result = parser(remaining)

      if (result) {
        results.push(result[0])
        remaining = result[1]
      } else {
        return [results, remaining]
      }
    }
  }
}

many(parseDigit)("123a") => [[1, 2, 3], "a"]

Here's a combinator which runs a parser and then skips a character.

const skip = (parser, char) => {
  return (input) => {
    const result = parser(input)

    if (result && result[1] === char) {
      return [result[0], result[1].slice(1)]
    } else {
      return null
    }
  }
}

Finally, here's a parser for comma separated list of digits.

const commaDigits = many(skip(parseDigit, ","))

commaDigits("1,2,3,") => [[1, 2, 3], ""]
commaDigits("1,2,3") => [[1, 2], "3"]

Often nicer than regular expressions

/\d*\w+/

vs

followedBy(
  many(digit),
  oneOrMore(wordChar)
)

And not just for strings!

That was super easy and you can see how powerful it is. It promotes code reuse with combinators and the result tends to be very declarative which makes it easy to understand. But this technique is even more powerful because there's no reason it has to take a string as its input.

Here's the type of our default parser:

stringParser :: String -> (a, String)

You could also make a parser for a stream of bytes off of a socket:

byteParser :: [Byte] -> (a, [Byte])

Or a list of tokens that you've gotten some other way:

tokenParser :: [Token] -> (a, [Token])

Widely available

There are parser combinator libraries in most languages at this point:

  • Haskell: Parsec, Trifecta
  • Javascript: Parsimmon, Bennu
  • Ruby: Parby, Kramer
  • Elixir: Combine, Paco

And so on...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment