Skip to content

Instantly share code, notes, and snippets.

@Usse
Created January 30, 2014 11:00
Show Gist options
  • Save Usse/8706407 to your computer and use it in GitHub Desktop.
Save Usse/8706407 to your computer and use it in GitHub Desktop.
Url scrape and insert in mongodb
from bs4 import BeautifulSoup
from pymongo import MongoClient
import urllib2
import re
import datetime
# DB connection
client = MongoClient()
db = client.test
collection = db.hackernews
# Page parsing
html_page = urllib2.urlopen("https://news.ycombinator.com/")
soup = BeautifulSoup(html_page)
titles = soup.select('td.title a')
collection.drop()
for link in titles:
collection.insert({
"title" : link.string,
"link" : link.get('href'),
"date" : datetime.datetime.utcnow()
})
print collection.count(), 'Objects inserted'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment