This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import json | |
import requests | |
import re | |
import random | |
import urllib | |
import lxml.html | |
import bs4 | |
import sys | |
reload(sys) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Configure item pipelines | |
# See http://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html | |
ITEM_PIPELINES = { | |
'amazon.pipelines.AmazonPipeline': 300, | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ | |
{"product_category": "Electronics,Computers & Accessories,Data Storage,External Hard Drives", "product_sale_price": "$949.95", "product_name": "G-Technology G-SPEED eS PRO High-Performance Fail-Safe RAID Solution for HD/2K Production 8TB (0G01873)", "product_availability": "Only 1 left in stock."}, | |
{"product_category": "Electronics,Computers & Accessories,Data Storage,USB Flash Drives", "product_sale_price": "", "product_name": "G-Technology G-RAID with Removable Drives High-Performance Storage System 4TB (Gen7) (0G03240)", "product_availability": "Available from these sellers."}, | |
{"product_category": "Electronics,Computers & Accessories,Data Storage,USB Flash Drives", "product_sale_price": "$549.95", "product_name": "G-Technology G-RAID USB Removable Dual Drive Storage System 8TB (0G04069)", "product_availability": "Only 1 left in stock."}, | |
{"product_category": "Electronics,Computers & Accessories,Data Storage,External Hard Drives", "product_sale_price": "$89.95", "product_name": "G-Technology G-DRIVE ev U |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
# Define your item pipelines here | |
# Don't forget to add your pipeline to the ITEM_PIPELINES setting | |
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html | |
class AmazonPipeline(object): | |
def process_item(self, item, spider): | |
return item |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import scrapy | |
from amazon.items import AmazonItem | |
class AmazonProductSpider(scrapy.Spider): | |
name = "AmazonDeals" | |
allowed_domains = ["amazon.com"] | |
#Use working product URL below | |
start_urls = [ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
# Define here the models for your scraped items | |
# | |
# See documentation in: | |
# http://doc.scrapy.org/en/latest/topics/items.html | |
import scrapy | |
class AmazonItem(scrapy.Item): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In [28]: top_stories = [] | |
In [29]: for i in zip(news_stories, news_links): | |
....: top_stories.append(i) | |
....: | |
In [30]: top_stories | |
Out[30]: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In [23]: for i in html_content.iterchildren(): | |
....: print i | |
....: | |
<Element head at 0x7f43a5737db8> | |
<Element body at 0x7f43a5737e10> | |
In [24]: news_stories = html_content.xpath('//h3[@data-analytics]/a/span/text()') | |
In [25]: news_links = html_content.xpath('//h3[@data-analytics]/a/@href') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In [21]: page = requests.get('http://www.cnn.com') | |
In [22]: html_content = html.fromstring(page.content) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In [19]: import requests | |
In [20]: from lxml import html |
NewerOlder