This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import scrapy | |
| from scrapy.loader.processors import MapCompose, TakeFirst | |
| from w3lib.html import remove_tags | |
| def remove_whitespace(value): | |
| return value.strip() | |
| class JokeItem(scrapy.Item): | |
| joke_text= scrapy.Field( | |
| input_processor= MapCompose(remove_tags, remove_whitespace), |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import scrapy | |
| from demo_project.items import JokeItem | |
| from scrapy.loader import ItemLoader | |
| class JokesSpider(scrapy.Spider): | |
| name= 'jokes' | |
| allowed_domais = ['www.laughfactory.com'] | |
| start_urls = [ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class JokeItem(scrapy.Item): | |
| joke_text= scrapy.Field() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| next_page= response.xpath("//li[@class='next']/a/@href").extract_first() | |
| if next_page is not None: | |
| next_page_link= response.urljoin(next_page) | |
| yield scrapy.Request(url=next_page_link, callback=self.parse) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def parse(self, response): | |
| for joke in response.xpath("//div[@class='jokes']"): | |
| yield { | |
| 'joke_text': joke.xpath(".//div[@class='joke-text']/p").extract_first() | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import scrapy | |
| class JokesSpider(scrapy.Spider): | |
| name= 'jokes' | |
| allowed_domains = ['www.laughfactory.com'] | |
| start_urls = [ | |
| 'http://www.laughfactory.com/jokes/family-jokes' | |
| ] | |
| def parse(self, response): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <p class='someClass'>Paragraph 1</p> | |
| <p id='someId'>Paragraph 2</p> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <div> | |
| <a href='www.example.com'>Link</a> | |
| </div> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <!DOCTYPE html> | |
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
| <meta http-equiv="X-UA-Compatible" content="ie=edge"> | |
| <title>Xpath Syntax</title> | |
| </head> |
NewerOlder