Skip to content

Instantly share code, notes, and snippets.

@yang-zhang
Created May 30, 2018 01:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yang-zhang/caefc181ca7e01f33c674946e7caf98c to your computer and use it in GitHub Desktop.
Save yang-zhang/caefc181ca7e01f33c674946e7caf98c to your computer and use it in GitHub Desktop.
Fixup function for text from fast.ai
re1 = re.compile(r' +')
def fixup(x):
x = x.replace('#39;', "'").replace('amp;', '&').replace('#146;', "'").replace(
'nbsp;', ' ').replace('#36;', '$').replace('\\n', "\n").replace('quot;', "'").replace(
'<br />', "\n").replace('\\"', '"').replace('<unk>','u_n').replace(' @.@ ','.').replace(
' @-@ ','-').replace('\\', ' \\ ')
return re1.sub(' ', html.unescape(x))
# https://github.com/fastai/fastai/blob/master/courses/dl2/imdb.ipynb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment