Skip to content

Instantly share code, notes, and snippets.

@hechen0
Created December 1, 2013 09:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hechen0/7730239 to your computer and use it in GitHub Desktop.
Save hechen0/7730239 to your computer and use it in GitHub Desktop.
crawl two url using for loop , poor
function download(url, callback){
http.get(url, function(res){
var data="";
res.on('data', function(chunk){
data += chunk;
});
res.on('end', function(){
callback(data);
});
}).on('error', function(){
callback(null);
});
}
var urls=["http://renren.com", "http://www.sina.com.cn"];
for(var i=0; i<urls.length; i++){
download(urls[i],function(data){
var urlPattern = /^(https?:\/\/)([\da-z\.-]+).\.([a-z\.]{2,10})([\/\w\.-]*)*\/?$/;
var $ = cheerio.load(data);
console.log("----url----"+i);
$("a").each(function(i, e){
var url = $(e).attr("href");
if(urlPattern.exec(url))
console.log(url);
});
});
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment