Skip to content

Instantly share code, notes, and snippets.

@nsdevaraj
Created March 3, 2024 12:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nsdevaraj/4b90f4f5e219b69f6892a8b81feb75d3 to your computer and use it in GitHub Desktop.
Save nsdevaraj/4b90f4f5e219b69f6892a8b81feb75d3 to your computer and use it in GitHub Desktop.
blogspot scrapper
"""
This script scrapes a Blogspot blog by iterating back in its history.
Usage:
1. Provide blogspot links
2. Press CTRL-C when you want to stop it.
Note: Your IP-number may be temporarily banned from the Blogger service if over-used.
Use on your own risk.
"""
import requests
import io
import re
from bs4 import BeautifulSoup
file1 = open('link.txt', 'r')
lines = file1.readlines()
counter = 0
for l in lines:
counter += 1 #Update the counter from proper filenames
page = requests.get(l)
soup = BeautifulSoup(page.content, "html.parser")
div = soup.find_all("div", {"class": "entry"})
with open(str(counter) + ".html", "w") as outputfile: #open file
outputfile.write(str(div)) #write to file
print("Press CTRL-C to exit the program.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment