Skip to content

Instantly share code, notes, and snippets.

@ericleasemorgan
Last active January 27, 2023 01:09
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ericleasemorgan/1a7722b21128d96a28762191690848bd to your computer and use it in GitHub Desktop.
Save ericleasemorgan/1a7722b21128d96a28762191690848bd to your computer and use it in GitHub Desktop.
some one-liners to extract urls, email address, and a dictionary from a text file
# extract all urls from a text file
cat file.txt | egrep -o 'https?://[^ ]+' | sed -e 's/https/http/g' | sed -e 's/\W+$//g' | sort | uniq -c | sort -bnr
# extraxt domains from URL's found in text files
cat file.txt | egrep -o 'https?://[^ ]+' | sed -e 's/https/http/g' | sed -e 's/\W+$//g' | sed -e 's/http:\/\///g' | sed -e 's/\/.*$//g' | sort | uniq -c | sort -bnr
# extract email addresses
cat file.txt | grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' | sort | uniq -c | sort -bnr
# list all words in a text file
cat file.txt | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | sort | uniq -c | sort -bnr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment