Rough grammar:
full ::= lang
["-" variant]
lang ::= base
*2("-" extension)
base ::= 2alpha
\ 3alpha
extension ::= 3alpha
variant ::= 4alpha
In funcparserlib:
import funcparserlib.parser as p
import string
skipch = lambda c: p.skip(p.a(c))
alpha = p.some(lambda c: c in string.letters)
base = (alpha + alpha + alpha) | (alpha + alpha)
extension = skipch("-") + alpha + alpha + alpha
possible_extensions = p.maybe(extension + p.maybe(extension))
lang = base + possible_extensions
variant = alpha + alpha + alpha + alpha
full = lang + p.maybe(skipch("-") + variant) + p.skip(p.finished)
full.parse('en')
full.parse('en-cat')
full.parse('en-cat-rat')
full.parse('en-cat-rat-dogs')
full.parse('en-dogs')
full.parse('en-cat-dogs')
Works fine:
en-cat
base: en
extension: cat
variant: nil
Also works fine:
en-cat-dogs
base: en
extension: cat
variant: dogs
en-cat-rat-dogs
base: en
extension: cat, rat
variant: dogs
This one breaks, because it parses up to "en-dog" as en with an extension of dog, and then can't parse the s. I need it to backtrack and say "oh, that didn't work, let's try it without the optional extension(s)".
en-dogs
base: en
extension: nil
variant: dogs
Also breaks similarly:
en-cat-dogs
base: en
extension: cat
variant: dogs