Skip to content

Instantly share code, notes, and snippets.

@pxeger

pxeger/lexer.py Secret

Last active March 28, 2022 17:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pxeger/48f97484364bf0b43dee974a8f0f4265 to your computer and use it in GitHub Desktop.
Save pxeger/48f97484364bf0b43dee974a8f0f4265 to your computer and use it in GitHub Desktop.
def lex(source):
"""lex a very simple C-like language
In this language, tokens can be:
(
)
a string, delimited by "double quotes" (no escapes)
a number
This uses a generator, but split into multiple functions which roughly
correspond to the state of the lexer, to allow separation of the different
syntaxes.
`return (yield from)` is used when the lexer wants to delegate the handling
of the current character a different state function.
"""
def initial():
if char == "(":
yield "L_PAREN"
return initial
elif char == ")":
yield "R_PAREN"
return initial
elif char == '"':
return string
elif char.isdigit():
# delegate to `number`
return (yield from number())
elif char == "":
yield "EOF"
else:
raise SyntaxError("invalid character")
def string(value=""):
if char == '"':
# end of string
yield "STRING", value
return initial
elif char == "":
# EOF
raise SyntaxError("unclosed string")
else:
value += char
return lambda: string(value)
def number(value=""):
if char.isdigit(): # part of the digit
# next time, continue with `value` having the character appended
value += char
return lambda: number(value)
else: # not a digit, so not part of the number
# emit the number token
assert value
yield "NUMBER", int(value)
# since this character wasn't a digit, it needs to be lexed accordingly
# using `return yield from` we delegate back to `initial` to handle the same character
return (yield from initial())
func = initial
for char in source:
func = yield from func()
char = "" # empty "character" to indicate EOF
yield from func()
def lex2(source):
print(*lex(source))
lex2("(12")
lex2("34")
lex2("56(")
lex2('"hello"')
lex2('"hello')
@pxeger
Copy link
Author

pxeger commented Mar 28, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment