Skip to content

Instantly share code, notes, and snippets.

@echevemaster
Created January 16, 2014 06:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save echevemaster/8450666 to your computer and use it in GitHub Desktop.
Save echevemaster/8450666 to your computer and use it in GitHub Desktop.
Recuperar todos los links de una página web dada
# Recuperar todas los links de una pagina
from bs4 import BeautifulSoup
import urllib2
import re
import json
def fetch_url(url):
url = urllib2.urlopen(url)
content = url.read()
soup = BeautifulSoup(content)
links = soup.findAll("a", href=True)
print "Se han encontrado:", len(links), "registros"
for a in links:
print a['href']
fetch_url("http://echevemaster.org")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment