Skip to content

Instantly share code, notes, and snippets.

@XIT07
Last active November 25, 2022 22:39
Show Gist options
  • Save XIT07/d0966dcb9f5e5d87173cbfe61a5d0cbc to your computer and use it in GitHub Desktop.
Save XIT07/d0966dcb9f5e5d87173cbfe61a5d0cbc to your computer and use it in GitHub Desktop.
Python , Scrape Redbubble's sitemap to get the most popular searches etc..
#Scrape Redbubble's sitemap to get the most popular searches etc..
#XIT07
#Python
import requests, csv
from bs4 import BeautifulSoup
from datetime import datetime
today = datetime.now()
url = 'https://www.redbubble.com/sitemap/new_works_00001.xml'
def datas(url):
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
data = soup.find_all('url')
return {i.find('loc').text: i.find('lastmod').text if i.find('lastmod') else None for i in data}
data = datas(url)
with open(today.strftime('%h_%Y-%m-%d_%H-%M-%S')+'.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(['url', 'lastmod'])
for i in data:
info = {'url': i, 'lastmod': data[i]}
print(f'parsing {info}')
writer.writerow(list(info.values()))
@meedbahrii
Copy link

how to use it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment