This little script lets you easily define a few things you want from a web page (using CSS selectors), the crawl the site until you get them all.
It's based on the way Kimono works, but it's much simpler and has no limit to the number of results you can get. It also uses your auth tokens from the browser, so it's just as secure as your browser (which you should still be suspect of).
Put that script into your browser terminal and run it. If you use Chrome, I highly recommend saving it as a snippet for easy reuse. To start scraping a site, create a Scraper
instance with your desired options:
var scraper = new Scraper({
container: 'li.person', // The highest common sibling you want to grab.
targets: { // The items you want to grab within each container.
first_name: { // A name for the data you're trying to scrape.
selector: '.profile span:first-child', // Query selector to the element you want to get data from.
parser: function(el) { return el.innerText } // A function you want to run on the found element.
}
},
next: '.pagination.next-page' // Query selector to the pagination link, if applicable.
})
Once that's set up, just start the scraping.
scraper.start();
At any point, you can request the current data set at the results
property, e.g. scraper.results
. Hint: to copy that to your clipboard in Chrome, use copy(scraper.results)
.
That's it! You can create multiple scraper instances and run them all simultaneously. If you have interesting uses of this, I'd love to hear it :D