Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
find {../data/www.huffingtonpost.com,../data/www.thenation.com} -type f -print0 |\
xargs -0 pv |\
iconv -c -t UTF8 |\
gsed "s/['’]s//g" | gsed "s/s['’]//g" |\
gsed 's/http.* //g' |\
gsed "s|[“”,‘/\"—…:;()#@!<>{}?=% &*_]| |g" |\
gtr -d "'" |\
gtr -d "" |\
gtr "[:upper:]" "[:lower:]" |\
gsed 's/[0-9]/ /g' |\
gsed 's/--/ /g' |\
gtr '[' ' ' |\
gtr '.' ' ' |\
gsed -E "s/[[:space:]]+/ /g" |\
gsed "s/creepilybut/creepily but/g" |\
gsed 's/-year-old//g' |\
gsed 's/-month-old//g' |\
gsed 's/ - //g' |\
gtr ']' ' ' > $@
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment