Simple scraping of a blog
import requests | |
from bs4 import BeautifulSoup | |
from csv import writer | |
response = requests.get('http://codedemos.com/sampleblog/') | |
soup = BeautifulSoup(response.text, 'html.parser') | |
posts = soup.find_all(class_='post-preview') | |
with open('posts.csv', 'w') as csv_file: | |
csv_writer = writer(csv_file) | |
headers = ['Title', 'Link', 'Date'] | |
csv_writer.writerow(headers) | |
for post in posts: | |
title = post.find(class_='post-title').get_text().replace('\n', '') | |
link = post.find('a')['href'] | |
date = post.select('.post-date')[0].get_text() | |
csv_writer.writerow([title, link, date]) |
This comment has been minimized.
This comment has been minimized.
Hi, the webpage: "http://codedemos.com/sampleblog/" is out of date and for sell. Have you moved the page by chance or are you able to provide another dummy page like example.com? |
This comment has been minimized.
This comment has been minimized.
thanks you just simplified web scraping. you are the best |
This comment has been minimized.
This comment has been minimized.
For anyone requesting sample pages, your best bet is to just put the HTML into the code yourself and try and scrape there. You will have to kind of skip over the requests part but that's fine because it is the easy part. Pretend whatever variable your HTML is under is the returned request. It works out the same since this is just an example for testing. |
This comment has been minimized.
This comment has been minimized.
For anyone looking to find good test sites, webscraper.io should do the trick |
This comment has been minimized.
This comment has been minimized.
Thank you Brad for the ultimate awesomeness |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This comment has been minimized.
hi i am trying scrape google playstore with this but i am just getting a a csv file with the headers i just changed the links and tags according to the page