Skip to content

Instantly share code, notes, and snippets.

@sjl
Created September 14, 2012 19:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sjl/3724052 to your computer and use it in GitHub Desktop.
Save sjl/3724052 to your computer and use it in GitHub Desktop.

Rough grammar:

full ::= lang
        ["-" variant]

lang ::= base
         *2("-" extension)

base ::= 2alpha
       \ 3alpha

extension ::= 3alpha

variant ::= 4alpha

In funcparserlib:

import funcparserlib.parser as p
import string

skipch = lambda c: p.skip(p.a(c))
alpha = p.some(lambda c: c in string.letters)

base = (alpha + alpha + alpha) | (alpha + alpha)
extension = skipch("-") + alpha + alpha + alpha
possible_extensions = p.maybe(extension + p.maybe(extension))

lang = base + possible_extensions

variant = alpha + alpha + alpha + alpha

full = lang + p.maybe(skipch("-") + variant) + p.skip(p.finished)

full.parse('en')
full.parse('en-cat')
full.parse('en-cat-rat')
full.parse('en-cat-rat-dogs')

full.parse('en-dogs')
full.parse('en-cat-dogs')

Works fine:

en-cat
base: en
extension: cat
variant: nil

Also works fine:

en-cat-dogs
base: en
extension: cat
variant: dogs

en-cat-rat-dogs
base: en
extension: cat, rat
variant: dogs

This one breaks, because it parses up to "en-dog" as en with an extension of dog, and then can't parse the s. I need it to backtrack and say "oh, that didn't work, let's try it without the optional extension(s)".

en-dogs
base: en
extension: nil
variant: dogs

Also breaks similarly:

en-cat-dogs
base: en
extension: cat
variant: dogs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment