Skip to content

Instantly share code, notes, and snippets.

@kozmonaut
Created January 30, 2015 08:12
Show Gist options
  • Save kozmonaut/08be6afc320dade6cd58 to your computer and use it in GitHub Desktop.
Save kozmonaut/08be6afc320dade6cd58 to your computer and use it in GitHub Desktop.
Extract links from "beautiful soup"
from bs4 import BeautifulSoup
import csv
import re
import requests
# Copy URL you wanna crawl
print ("Copy the link you wanna scrap:")
url = raw_input("->")
# Fetch url
request = requests.get(url)
# All in text
plain = request.text
# Make sa soup
soup = BeautifulSoup(plain)
# Open file for writting results
f = open("result.html", "w")
# Type the term you wanna use for search
term = "someterm"
# Use regex for filtering
links = soup.find_all(href=re.compile(term))
# Go through all links
for link in links:
fullLink = link.get('href')
# Write links into file
f.write(fullLink + '\n')
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment