Skip to content

Instantly share code, notes, and snippets.

Last active February 8, 2020 11:19
Show Gist options
  • Save smidgedy/f9049a5f85ac1f9c76bca3a40038d120 to your computer and use it in GitHub Desktop.
Save smidgedy/f9049a5f85ac1f9c76bca3a40038d120 to your computer and use it in GitHub Desktop.
exploring targets by word frequency from masscan output

Doing a big big big masscan and grabbing headers, currently have targets in mind for a project but wanted to find a way to explore the other stuff active on the same ports. Used this deeply terrible one-liner to split up the HTTP banners into tokens and then count token frequency.

fgrep -i http masscan.json | sed 's/[,]$//'\
 | jq -s ".[].ports[].service.banner" | sed 's/[";:,<>()]//g'\
 | sed "s/[']//g" | sed -E 's/([\\]r|[\\]n)+/ /g'\
 | sed 's/[\/=]/ /g' | awk '{ for (i=1; i<=NF; i++) { print $i}}'\
 | tr '[:upper:]' '[:lower:]' | grep -E '^.{4,}$'\
 | grep -Ev '^[0-9.]+$' | sort | uniq -c | sort -n
  • fgrep -i http masscan.json - only care about records that found HTTP
  • | sed 's/[,]$//' - records have a trailing comma I need to strip
  • | jq -s '.[].ports[].service.banner' - iterate the records and extract just the HTTP banner
  • | sed | sed 's/[";:,<>()]//g' - strip quotes and other junk that mess with the frequency counting
  • | sed "s/[']//g" - strip out single quotes (not sure how to escape this in the line above lol)
  • | sed -E 's/([\\]r|[\\n])+/ /g' - convert the references to newlines into spaces
  • | sed 's/[\/=]/ /g' - convert foreslashes and equals signs into spaces
  • | awk '{ for (i=1; i<=NF; i++) { print $i}}' - print each token on a new line
  • | tr '[:upper:]' '[:lower:]' - convert each token to lowercase
  • | grep -E '^.{4,}$' - dump any tokens less than 4 characters long
  • | grep -Ev '^[0-9.]+$' - dump any tokens that are just numeric (looking at you, version numbers)
  • | sort | uniq -c | sort -n - sort the tokens, count the # of occurences of each unique token, sort the results by frequency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment