Skip to content

Instantly share code, notes, and snippets.

@ajxchapman
Created March 8, 2019 10:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ajxchapman/597064d75d03c3726e26e0a8b13263ca to your computer and use it in GitHub Desktop.
Save ajxchapman/597064d75d03c3726e26e0a8b13263ca to your computer and use it in GitHub Desktop.
Wordlist generator based on observed words from given URLs
import re
import requests
import inflect
seed_urls = [
"http://www.example.com",
]
cookies = {"session" : "2eyhsb2dnZxWRJ9biI6dHJ1ZXr0"}
prefixes = ["get", "set", "get_", "set_"]
data = ""
for url in seed_urls:
r = requests.get(url, cookies=cookies)
data += r.text
wordset = set(re.findall(r'\b[A-Za-z]+\b', data))
# print(wordset)
e = inflect.engine()
for word in list(wordset):
for _word in [word, word.lower(), word.capitalize()] + [prefix + word for prefix in prefixes] + [prefix + word.lower() for prefix in prefixes] + [prefix + word.capitalize() for prefix in prefixes]:
wordset.add(_word)
wordset.add(e.plural(_word))
wordset.add(e.singular_noun(_word))
wordset.add(e.present_participle(_word))
for x in sorted(x for x in wordset if isinstance(x, str)):
print(x)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment