Skip to content

Instantly share code, notes, and snippets.

@jshhrrsn
Last active January 15, 2020 12:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jshhrrsn/5377b9dd282ef51f5564f1347a7d5aef to your computer and use it in GitHub Desktop.
Save jshhrrsn/5377b9dd282ef51f5564f1347a7d5aef to your computer and use it in GitHub Desktop.
import os
import shutil
KEEP = ('en')
def remove_dirs():
os.chdir('lang')
for dir_ in os.listdir(os.getcwd()):
if os.path.isdir(dir_):
if dir_ not in KEEP:
shutil.rmtree(dir_, ignore_errors=False, onerror=None)
os.chdir('./')
if __name__ == "__main__":
remove_dirs()
@jshhrrsn
Copy link
Author

jshhrrsn commented Jan 15, 2020

This is intended to remove unneeded languages from spacy (e.g. for slimming down to run in AWS Lambda). To use:

  1. Paste this code into spacy/lang/
  2. Update KEEP to include any language models you require (anything not in here will be removed)
  3. Run the file python prune_langs.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment