Skip to content

Instantly share code, notes, and snippets.

@jappy
Created March 11, 2012 07:54
Show Gist options
  • Save jappy/2015491 to your computer and use it in GitHub Desktop.
Save jappy/2015491 to your computer and use it in GitHub Desktop.
unix command to sort trigrams from a text in order of frequency
# Case sensitive version
tr -sc 'A-Za-z' '\n' < textfile > textfile.words
tail +2 textfile.words > textfile.nextwords
tail +2 textfile.nextwords > textfile.nextnextwords
paste textfile.words textfile.nextwords textfile.nextnextwords | sort | uniq -c > textfile.trigrams
sort -nr < textfile.trigrams
# Case insensitive version
tr 'A-Z' 'a-z' < textfile | tr -sc 'a-z' '\n' > textfile.words
tail +2 textfile.words > textfile.nextwords
tail +2 textfile.nextwords > textfile.nextnextwords
paste textfile.words textfile.nextwords textfile.nextnextwords | sort | uniq -c > textfile.trigrams
sort -nr < textfile.trigrams
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment