Skip to content

Instantly share code, notes, and snippets.

@cvorland
cvorland / blog_scraper.py
Last active July 19, 2023 16:01
Some ugly code to scrape blog posts and count how many, count total words, and count researchblogging.org references. Attributes will differ depending on theme used and platform (made using wordpress).
from BeautifulSoup import BeautifulSoup
import mechanize
import time
import re
pagenum = 1
url = "http://www.bloghomepage.com"
browser = mechanize.Browser()
page = browser.open(url)
postcount = 0