Skip to content

Instantly share code, notes, and snippets.

@pythonizame
Created February 6, 2015 04:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pythonizame/9d9b2dccd39047656aab to your computer and use it in GitHub Desktop.
Save pythonizame/9d9b2dccd39047656aab to your computer and use it in GitHub Desktop.
Simple Pythonizame Scrapy
# -*- coding: utf-8 -*-
"""
Ejemplo básico de uso de la librería Scrapy para obtener información de una página web.
## Requerimientos: ##
1. Instalar Scrapy (pip install scrapy)
2. Descargar el archivo pyme_scrapy.py
3. Correr de la siguiente manera:
$ scrapy runspider pyme_scrapy.py -o scraped_data.json
Nota: El resultado será devuelto con formato json.
"""
from scrapy import Field, Spider, Item, Selector
class Post(Item):
url = Field()
title = Field()
class PythonizameSpider(Spider):
name, start_urls = 'PythonizameSpider', ['http://pythoniza.me']
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//div[@id="blog"]//h2')
items = []
for site in sites:
post = Post()
post['title'] = site.xpath('a/text()').extract()
post['url'] = site.xpath('a/@href').extract()
items.append(post)
return items
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment