Skip to content

Instantly share code, notes, and snippets.

@szero
Last active June 6, 2021 22:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save szero/0b9bdd73212f23ded7da5ecd70974641 to your computer and use it in GitHub Desktop.
Save szero/0b9bdd73212f23ded7da5ecd70974641 to your computer and use it in GitHub Desktop.
Scrap youtube video page by going through the consent screen
from requests import Session
from bs4 import BeautifulSoup
UA = (
"Mozilla/5.0 (Linux; cli) pyrequests/0.1 "
"(python, like Gecko, like KHTML, like wget, like CURL) myscrapper/1.0"
)
req = Session()
req.headers.update({"User-Agent": UA})
def get_page_source(url):
r = req.get(url).text
if "itemprop" in r:
return r
post_builder = {}
soup = BeautifulSoup(r, 'html.parser')
for i in soup.find_all("input"):
try:
post_builder.update({i["name"] : i["value"]})
except KeyError:
continue
return req.post("https://consent.youtube.com/s", data=post_builder).text
print(get_page_source("https://www.youtube.com/watch?v=dQw4w9WgXcQ"))
@szero
Copy link
Author

szero commented Jun 6, 2021

So before you wanted to scrape a youtube page, setting appropriate User-Agent was enough but now it seems you are always greeted with page full of legal cookie nonsense, my thing here seems to go through that for now. The itemprop part is just an attribute that appears in video pages but doesn't on consent page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment