Skip to content

Instantly share code, notes, and snippets.

@jilm
Created February 1, 2019 19:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jilm/451367c5c6ea6650ae25c9e0a4a1fc59 to your computer and use it in GitHub Desktop.
Save jilm/451367c5c6ea6650ae25c9e0a4a1fc59 to your computer and use it in GitHub Desktop.
cat document | cut -d ';' -f 13 | tr '[:upper:]' '[:lower:]' | tr '[:punct:]' ' ' | tr ' ' '\n' | sort | uniq | ./stemmer > stems.txt
cat document | cut ';' -f 13 | tr '[:upper:]' '[:lower:]' | tr '[:punct:]' ' ' | tr ' ' '\n' | sort | uniq > words.txt
cat words.txt | awk '{print "["$0"]"}' | tr '[' '"' | tr ']' '"'
paste words.txt stems.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment