Skip to content

Instantly share code, notes, and snippets.

@TVFlash
Last active August 29, 2015 14:26
Show Gist options
  • Save TVFlash/5c2a9c378d0401ab8a2c to your computer and use it in GitHub Desktop.
Save TVFlash/5c2a9c378d0401ab8a2c to your computer and use it in GitHub Desktop.
from bs4 import BeautifulSoup, NavigableString
from urllib2 import urlopen, Request
from subprocess import check_output
import click
def scrapeSub(say):
subreddit = raw_input("Subreddit: ")
endpoint = "http://reddit.com/r/%s"
hdr = { 'CommandLineReddit' : 'Just scraping by' }
req = Request(endpoint % subreddit, headers=hdr)
f = urlopen(req)
html = f.read()
f.close()
soup = BeautifulSoup(html, 'html.parser')
posts = soup.find_all('div', {'class' : 'entry'})
print "----------Displaying top of /r/%s----------" % subreddit
for post in posts:
title = post.find('a', {'class': 'title'}).text
time = post.find('time', {'class': 'live-timestamp'}).text
poster = post.find('a', {'class': 'author'}).text
comments = "[ \033[1m" + post.find('a', {'class': 'comments'}).text + "\033[0m ]"
print title, "\n\tsubmitted", time , "by", poster, comments, "\n"
if say:
result = check_output("say \"" + title + "\"", shell=True)
if __name__ == '__main__':
verbal = click.confirm('Would you like the posts to be verbalized?')
response = True
while response:
scrapeSub(verbal)
response = click.confirm('Another subreddit?')
print "Get back to work!"
@TVFlash
Copy link
Author

TVFlash commented Jul 30, 2015

This program will pull down the front page of a specified subreddit and read it if requested!

Install

pip install beautifulsoup4

Running

python redditScraper.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment