Created
April 14, 2013 15:44
-
-
Save JKirchartz/5383142 to your computer and use it in GitHub Desktop.
Scrape all Quotes on a Kwotes site. Kwotes is an open-source quote database with ranking et cetera found here: http://sourceforge.net/projects/kwotes/
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from pyquery import PyQuery | |
def grabber(i=0): | |
base_url = "http://principiadiscordia.com/memebombs/" | |
url = base_url + "kwotes.pl?action=list&m=501&so=reverse&o=date&s=" + str(i) | |
PQ = PyQuery(url) | |
BQ = PQ("blockquote") | |
for x in BQ: | |
try: | |
print x.text | |
except: | |
error = 1 | |
#this is not error handling... it's error ignoring. | |
if len(BQ): | |
grabber(i+500) | |
grabber() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
http://sourceforge.net/projects/kwotes/ hasn't been updated in 6 years, and it's a perl script. I doubt there's a ton of these sites out there, but if you come across one and want to get all the quotes out of it, I hope this helps... to dump into a file just do the standard unixy
python kwotes.py > quotes.txt