Skip to content

Instantly share code, notes, and snippets.

@josephwilk
Created March 27, 2019 22:25
Show Gist options
  • Save josephwilk/ba1299cd9b3c1a3802828ad9386c573c to your computer and use it in GitHub Desktop.
Save josephwilk/ba1299cd9b3c1a3802828ad9386c573c to your computer and use it in GitHub Desktop.
$2 = /usr/share/dict/words
#Fetch binary data
wget $1 -O data
#convert into strings
strings data | sed 's/\([A-Z]\)/ \1/g' | sed 's/[^a-zA-Z]//g'| tr '[:upper:]' '[:lower:]' | tr -s '[:blank:]' '\n' > strings.txt
#Find dictionary words in data
awk 'length > 2' strings.txt > strings.big.txt
awk 'FNR==NR{dict[$1]++;next} {for(i=1;i<=NF;i++)if(!($i in dict))next}1' /usr/share/dict/words strings.big.txt | uniq > words.txt
awk 'FNR==NR{dict[$1]++;next} {for(i=1;i<=NF;i++)if(!($i in dict))next}1' /usr/share/dict/connectives strings.txt | uniq > connect.txt
#join connectives and words
paste -d '\n' connect.txt words.txt | tr \\n ' '
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment