Skip to content

Instantly share code, notes, and snippets.

@seisvelas
Created November 24, 2020 07:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save seisvelas/8e72ed814937b4fec28805d38c8f4dc5 to your computer and use it in GitHub Desktop.
Save seisvelas/8e72ed814937b4fec28805d38c8f4dc5 to your computer and use it in GitHub Desktop.
Script to collect lots of quotes from cockbox.org (VPS site) quote section
import urllib.request
from bs4 import BeautifulSoup
quotes = []
for i in range(10_000):
try:
fp = urllib.request.urlopen("https://www.cockbox.org")
cockbytes = fp.read()
cockstr = cockbytes.decode("utf8")
fp.close()
soup = BeautifulSoup(cockstr, features="html.parser")
quote = soup.find('blockquote').contents[0].strip()
if quote not in quotes:
quotes.append(quote)
except: # Just ignore errors :o :D <3
pass
print(quotes)
@seisvelas
Copy link
Author

seisvelas commented Nov 24, 2020

TODO:

  • Sleep briefly between requests
  • Make exception handle specific errors
  • Use sqlite instead of list
  • Use hex chars to hide 'bad' word

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment