Skip to content

Instantly share code, notes, and snippets.

@managedkaos
Created August 1, 2017 04:29
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save managedkaos/e3262b80154129cc9a976ee6ee943da3 to your computer and use it in GitHub Desktop.
Scrape lyrics from azlyrics.com
import requests
from bs4 import BeautifulSoup
url = "http://www.azlyrics.com/lyrics/onyx/bacdafucup.html"
print "Default request (it will fail)..."
# make the default request
try:
r = requests.get(url)
except requests.exceptions.RequestException as e:
print e
print "User-Agent request (it will pass)..."
# act like a mac
headers = {'User-Agent':"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30"}
# make a request for the data
r = requests.get(url, headers=headers)
# convert the response text to soup
soup = BeautifulSoup(r.text, "lxml")
# get the goods
for goods in soup.find_all("div", {"class":None}):
if len(goods.text) == 0: pass
print goods.text
@arsenikov
Copy link

arsenikov commented Aug 9, 2021

@managedkaos
Copy link
Author

Rock on! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment