Skip to content

Instantly share code, notes, and snippets.

@paulera
Created January 30, 2023 13:20
Show Gist options
  • Save paulera/950aec7d5d54c76adbe06b40f5e22f13 to your computer and use it in GitHub Desktop.
Save paulera/950aec7d5d54c76adbe06b40f5e22f13 to your computer and use it in GitHub Desktop.
Clean up text data for creating wordclouds

Cleaning up text data for wordcloud

Regular expression replacements to be done in the order presented, for cleaning up words that might interfere in the resulting cloud.

Step Replace... For...
Remove punctuation [;\.\(\)!/"] space
Add space to lines start and end [^|$] space
Remove stopwords [^a-z](share|and|at|by|the|is|that|it|or|to|it's|of|a|an|btw|be|in|if|be|amd|the|just|get|'ll's)[^a-z] space
Break in lines [ \r\n\t]+ \n

Wordcloud tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment