Skip to content

Instantly share code, notes, and snippets.

View curita's full-sized avatar

Julia Medina curita

  • Scrapinghub
  • Córdoba, Argentina
View GitHub Profile
@curita
curita / testing-goodreads-book-urls.txt
Last active October 17, 2023 15:08
Testing crawl_source file for Goodreads
https://www.goodreads.com/book/show/22837718-qualia-the-purple
https://www.goodreads.com/book/show/57916643-the-year-s-midnight
https://www.goodreads.com/book/show/31312596-letters-from-a-shipwreck-in-the-sea-of-suns-and-moons
https://www.goodreads.com/book/show/44539716-the-nothing-within
https://www.goodreads.com/book/show/60286274-the-reyes-incident
https://www.goodreads.com/book/show/42348385-the-narrows
https://www.goodreads.com/book/show/56135545-the-spark
https://www.goodreads.com/book/show/55962500-legacy-of-the-brightwash
https://www.goodreads.com/book/show/33965336-seek-the-throat-from-which-we-sing
https://www.goodreads.com/book/show/60214731-into-the-fire
https://www.imdb.com/title/tt26907957/
https://www.imdb.com/title/tt10638522/
https://www.imdb.com/title/tt15837338/
https://www.boxofficemojo.com/title/tt26907957/
https://www.boxofficemojo.com/title/tt10638522/
https://www.boxofficemojo.com/title/tt15837338/
@curita
curita / gist:a45abcfc2e19d7474f3bff0ab36ad478
Created October 27, 2023 19:53
Ad-hoc JustWatch job (Oct 27 2023)
https://www.justwatch.com/us/tv-show/the-bear
https://www.justwatch.com/us/tv-show/the-boys
https://www.justwatch.com/us/tv-show/the-wheel-of-time
https://www.justwatch.com/us/movie/no-one-will-save-you
https://www.justwatch.com/us/tv-show/family-guy
https://www.justwatch.com/us/tv-show/wilderness
https://www.justwatch.com/us/tv-show/what-we-do-in-the-shadows
"""
There are cases where jobs can fail abruptly in such a way that Spidermon
(or any other extensions that run at the end of Scrapy) won't run.
In these situations, we won't be alerted that something happened because
Spidermon didn't run at the end, so it won't generate alerts and ScrapyCloud
also won't warn about them.
This script has the objective of helping identifying those jobs.
In order to use it (either locally or in scrapy cloud), put the following script
in your project:
.. code-block:: python