-
-
Save szczys/c150539ed4cb2c176eeffb67acfffc73 to your computer and use it in GitHub Desktop.
The holy tokenizer
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def tokenize(instring, delimiters=[',',':',';','[',']','+','-']): | |
''' | |
Tokenize a string of ASM code, splitting based on special characters | |
but at the same time including delimiters (but not whitespace) in the set | |
''' | |
tokens = instring.split() | |
for d in delimiters: | |
newtokens = list() | |
for t in tokens: | |
raw = t.split(d) | |
for r_idx, r_token in enumerate(raw): | |
if r_token != '': | |
''' | |
element will be empty when delimiter begins or | |
ends the string that was split | |
so don't add empty elements | |
''' | |
newtokens.append(r_token) | |
if r_idx != len(raw)-1: | |
newtokens.append(d) | |
tokens = newtokens | |
return tokens | |
test = "MOV [ R7 :R8],R0 ; Testing stuff" | |
print(tokenize(test)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Well, I'm not super excited to have it out there since it still feels a bit hacky. But it is relatively stable right now, and of course always happy to have help on passion projects like the conference badges ;-)
Here's a snapshot to play with: https://gist.github.com/szczys/b9a19714ea27d50be01d1a8479f97795