Skip to content

Instantly share code, notes, and snippets.

@ritiek
Last active December 22, 2017 07:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ritiek/b7a81d731ff6c46c1ce00ebb68418613 to your computer and use it in GitHub Desktop.
Save ritiek/b7a81d731ff6c46c1ce00ebb68418613 to your computer and use it in GitHub Desktop.
Scrape links from google search
#!/usr/bin/env python
import requests
from bs4 import BeautifulSoup
search_term = 'hello'
"""
0 = page 1
10 = page 2
...
"""
number = 0
while True:
search_url = 'https://www.google.com/search?q={}&start={}'.format(search_term, str(number))
page = requests.get(search_term)
soup = BeautifulSoup(page.text, 'html.parser')
for x in soup.find_all('h3', {'class':'r'}):
raw_link = x.find('a')['href']
end = raw_link.find('&sa=')
# check validity
if raw_link[7:end].find('?q=') == -1:
print(raw_link[7:end])
number += 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment