Skip to content

Instantly share code, notes, and snippets.

@Mizzlr
Created October 11, 2018 06:02
Show Gist options
  • Save Mizzlr/eec29687704aa81bf61dfccda36ddb8c to your computer and use it in GitHub Desktop.
Save Mizzlr/eec29687704aa81bf61dfccda36ddb8c to your computer and use it in GitHub Desktop.
Convert a given camel case python codebase into snake case.
import stringcase, tokenize, pathlib, tqdm, os
os.system("mkdir -p converted/src/")
os.system("rm -rf converted/src/*")
os.system("cp -r src/* converted/src/")
IGNORE = {'dictConfig', 'Box'}
keywords = set()
source_path = pathlib.Path('converted/src/')
for filename in source_path.glob('**/*.py'):
print(f'Scanning tokens {filename} ...')
tokens = tokenize.tokenize(open(filename, 'rb').__next__)
for token in tokens:
if token.type == 1 or token.type == '3' and ' ' not in token.string:
keywords.add(token.string)
print(len(keywords))
conversions = [(old, stringcase.snakecase(old) if old[0].islower() and old not in IGNORE else old) for old in keywords]
for filename in list(source_path.glob('**/*.py')) + list(source_path.glob('**/*.json')):
print(f'Converting file {filename} ...')
code = open(filename, 'r').read()
for old, new in tqdm.tqdm(conversions, desc="Converting"):
code = code.replace(old, new)
with open(filename, 'w') as wfile:
wfile.write(code)
print('Done')
@vivaria
Copy link

vivaria commented Jul 20, 2023

Thank you so, so much for this script! Saved me a ton of time tackling this task on my codebase: vivaria/tja2fumen#8

Some small nitpicks:

  • I needed to add encoding='utf8' to both open commands to avoid UnicodeDecodeErrors
  • Any full-caps subwords (e.g. acronyms like BPM) will be treated as multiple words by stringcase.snakecase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment