Skip to content

Instantly share code, notes, and snippets.

@judotens
Last active December 29, 2015 01:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save judotens/7595947 to your computer and use it in GitHub Desktop.
Save judotens/7595947 to your computer and use it in GitHub Desktop.
Find top unigram & bigram using bash
#!/bin/bash
# cat artikel.txt | ./ubigram.sh
# results sorted by frequences
# freq, keyword
cat "$@" |
tr -cs "a-zA-Z0-9" '\012' | tr '[:upper:]' '[:lower:]' |
{
old="aaa."
while read new
do
case "$old" in
*.) : OK;;
*) echo "$old $new";
echo "$old";
esac
old="$new"
done
} | sort | uniq -c | sort -nr -k 2 | awk '{print $1"\t"$2" "$3}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment