This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mkdir nairalandapi | |
cd nairalandapi | |
virtualenv venv | |
source venv/bin/activate | |
pip install scrapy |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/html/body/div[@class='body']/table[@class='boards'][2]/tbody/tr[2]/td[@class='featured w'] XPath notation to extract the front page topics | |
/html/body/div[@class='body']/table[@class='boards'][2]/tbody/tr[2]/td[@class='featured w']/a/@href XPath notation to extract the href links |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
scrapy startproject nairalandapi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
scrapy genspider nairaland nairaland.com |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import scrapy | |
class NairalandSpider(scrapy.Spider): | |
name = "nairaland" | |
allowed_domains = ["nairaland.com"] | |
start_urls = ['http://nairaland.com/'] | |
def parse(self, response): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import scrapy | |
#this items file is just going to hold 3 items. | |
title = scrapy.Field() | |
link = scrapy.Field() | |
#house keeping field | |
timestamp = scrapy.Field() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import scrapy | |
from scrapy.selector import Selector | |
from scrapy.loader import ItemLoader | |
from scrapy.loader.processors import MapCompose, Join | |
from nairalandapi.items import NairalandapiItem | |
import datetime | |
class NairalandapiSpider(scrapy.Spider): | |
name = "nairaland" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
scrapy crawl nairaland -o results.json | |
#This command will crawl the nairaland home page and save the results of the news items in a file names 'results.json' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ | |
{"timestamp": ["2017-05-20 08:43:10"], "title": ["Retired Generals Warn Military To Stay Out Of Politics"]}, | |
{"timestamp": ["2017-05-20 08:43:10"], "title": ["Throwback Photo", "Of", "Oyo First Lady, Mrs Florence Ajimobi", "In", "1966"]}, | |
{"timestamp": ["2017-05-20 08:43:10"], "title": ["Why Does Naira Appreciate Whenever President Buhari Travels?"]}, | |
{"timestamp": ["2017-05-20 08:43:10"], "title": ["Senator Dino Melaye\u2019s Father Prays On His Anti-Corruption Book", "(", "Photo", ")"]}, | |
{"timestamp": ["2017-05-20 08:43:10"], "title": ["\"Why Northern Leaders Worked Against Jonathan\u2019s Government\" \u2013 Paul Unongo"]}, | |
{"timestamp": ["2017-05-20 08:43:10"], "title": ["Nollywood Actor, Jim Iyke Visits Nnamdi Kanu", "(", "Photos", ")"]}, | |
{"timestamp": ["2017-05-20 08:43:10"], "title": ["Governor Fayose", "Vs", "Governor Ambode: Comparison", "By", "APC London", "(", "Photos", ")"]}, | |
{"timestamp": ["2017-05-20 08:43:10"], "title": ["Peter Obi Spotted Walking Home", "On", "Lagos Bridge After Church Servi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from flask import Flask, jsonify | |
import os, json | |
app = Flask(__name__) | |
@app.route("/nairaland/api/v1.0/homepage", methods=["GET"]) | |
def homepage(): | |
with app.open_resource("static/data/results.json") as f: | |
data = json.load(f) | |
return jsonify(data) |
OlderNewer