Skip to content

Instantly share code, notes, and snippets.

@szczys
Last active November 28, 2020 07:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save szczys/c150539ed4cb2c176eeffb67acfffc73 to your computer and use it in GitHub Desktop.
Save szczys/c150539ed4cb2c176eeffb67acfffc73 to your computer and use it in GitHub Desktop.
The holy tokenizer
def tokenize(instring, delimiters=[',',':',';','[',']','+','-']):
'''
Tokenize a string of ASM code, splitting based on special characters
but at the same time including delimiters (but not whitespace) in the set
'''
tokens = instring.split()
for d in delimiters:
newtokens = list()
for t in tokens:
raw = t.split(d)
for r_idx, r_token in enumerate(raw):
if r_token != '':
'''
element will be empty when delimiter begins or
ends the string that was split
so don't add empty elements
'''
newtokens.append(r_token)
if r_idx != len(raw)-1:
newtokens.append(d)
tokens = newtokens
return tokens
test = "MOV [ R7 :R8],R0 ; Testing stuff"
print(tokenize(test))
@szczys
Copy link
Author

szczys commented Nov 28, 2020

Well, I'm not super excited to have it out there since it still feels a bit hacky. But it is relatively stable right now, and of course always happy to have help on passion projects like the conference badges ;-)

Here's a snapshot to play with: https://gist.github.com/szczys/b9a19714ea27d50be01d1a8479f97795

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment