Skip to content

Instantly share code, notes, and snippets.

@jappy
Created March 11, 2012 07:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jappy/2015480 to your computer and use it in GitHub Desktop.
Save jappy/2015480 to your computer and use it in GitHub Desktop.
unix command to sort bigrams from a text in order of frequency
# Case sensitive version
tr -sc 'A-Za-z' '\n' < textfile > textfile.words
tail +2 textfile.words > textfile.nextwords
paste textfile.words textfile.nextwords | sort | uniq -c > textfile.bigrams
sort -nr < textfile.bigrams
# Case insensitive version
tr 'A-Z' 'a-z' < textfile | tr -sc 'a-z' '\n' > textfile.words
tail +2 textfile.words > textfile.nextwords
paste textfile.words textfile.nextwords | sort | uniq -c > textfile.bigrams
sort -nr < textfile.bigrams
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment