Skip to content

Instantly share code, notes, and snippets.

@MegaIng
Created February 6, 2022 14:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MegaIng/c6abba4d9be87473d8d586734f2b39c9 to your computer and use it in GitHub Desktop.
Save MegaIng/c6abba4d9be87473d8d586734f2b39c9 to your computer and use it in GitHub Desktop.
from typing import Any, Iterator
from lark import Lark, Token
from lark.common import LexerConf
from lark.lexer import Lexer, BasicLexer, LexerState
class RecursiveLexerThread:
def __init__(self, lexer: Lexer, text: str):
print(lexer, text)
self.lexer = lexer
self.state_stack = [LexerState(text)]
def lex(self, parser_state):
while self.state_stack:
lexer_state = self.state_stack[-1]
lex = self.lexer.lex(lexer_state, parser_state)
try:
token = next(lex)
except StopIteration:
self.state_stack.pop() # We are done with this file
else:
if token.type == "_INCLUDE":
name = token.value.split()[-1] # get just the string
name = name[1:-1] # Remove "
self.state_stack.append(LexerState(test_files[name]))
yield token # The parser still expects this token either way
parser = Lark(r"""
start: (statement|_INCLUDE)+
statement: NAME "=" value -> assignment
!?value: (value ("+"|"-"))? mul
!?mul: (mul ("*"|"/"))? atom
?atom: NAME -> variable
| NUMBER -> number
| "(" value ")"
_INCLUDE.1: "INCLUDE" /\s+/ STRING
%import common.CNAME -> NAME
%import common.SIGNED_INT -> NUMBER
%import common.ESCAPED_STRING -> STRING
%ignore /\s+/
""", _plugins={
"LexerThread": RecursiveLexerThread
}, parser="lalr")
test_files = {
"a": """
pi = 31415/10000
""",
"b": """
INCLUDE "a"
tau = pi * 2
"""
}
tree = parser.parse("""
INCLUDE "b"
result = tau
""")
print(tree.pretty())
@westgate
Copy link

westgate commented Feb 8, 2022

this appears to work. I've still got to bang on it a little. I can see that this goes a little deeper inside Lark than I'd been yet.

@westgate
Copy link

westgate commented Feb 8, 2022

this works well, until there are errors. I think the lex state should hold a copy of the file name so it can report it with an error (along with the line number and column which it already does). I'm going to try to add it on top of this. ideally, it could give you the file names all the way up.

@westgate
Copy link

westgate commented Feb 9, 2022

curious, why didn't you make RecurisveLexerThread inherit from the LexerThread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment