Skip to content

Instantly share code, notes, and snippets.

View turian's full-sized avatar

Joseph Turian turian

View GitHub Profile
@turian
turian / extractors.py
Created August 5, 2012 02:11 — forked from osiloke/extractors.py
A scrapy link extractor that uses BeautifulSoup
"""
Modified from: https://gist.github.com/1142142
to include the text of the link.
"""
import re
from scrapy.link import Link
from urlparse import urljoin
from BeautifulSoup import BeautifulSoup