Skip to content

Instantly share code, notes, and snippets.

@astur
Last active June 9, 2016 15:10
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save astur/d40bbb5a9b1b622bcb7b to your computer and use it in GitHub Desktop.
Save astur/d40bbb5a9b1b622bcb7b to your computer and use it in GitHub Desktop.
Example of using icrawler to scrape data from Ferra.ru
var icrawler = require('icrawler');
var fs = require('fs');
var URL = 'http://www.ferra.ru/ru/techlife/news/';
var opts = {
errorsFirst: true,
concurrency: 10,
saveOnFinish: false,
saveOnCount: 500,
asyncParse: true,
file: './data.json',
};
icrawler(URL, opts, function(url, $, _, res){
if ($('div.b-option-nav').length < 1) {
return _.cb(true);
}
if($('.b_infopost').contents().eq(2).text().trim().slice(0, -1) === 'Алексей Козлов'){
_.save({
title: $('h1').text(),
date: $('.b_infopost>.date').text(),
href: url,
size: $('.newsbody').text().length
});
_.step();
}
$('.b_rewiev p>a').each(function() {
_.push($(this).attr('href'));
});
$('.bpr_next>a').slice(0,1).each(function() {
_.push($(this).attr('href'));
});
_.cb();
}, function(result){
fs.writeFileSync('./data.json', JSON.stringify(result, null, 4))
console.log('Results saved');
});
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment