Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
Hacker News scraper artoo.js bookmarklet

artoo.js Hacker News Bookmarklet

The intention of this gist is merely to show what can be done with artoo.js and how you create a simple bookmarklet to address your issues.

This gist therefore contains a basic artoo.js project with a package.json, a gulpfile.js and an index.js file.


To manually build the bookmarklet download this gist, enter its folder and run the following:

# Assuming you have installed grunt ([sudo] npm install -g grunt grunt-cli)
npm install

This should create a build/hacker_news.bookmark.js file containing the bookmarklet as well as automatically copying it to your clipboard so you can install it on your browser without further ado.


  • .gitignore : Do you really need an explanation?
  • gulpfile.js : containing the gulp task compiling the bookmarklet.
  • index.js : the Hacker News scraper depending on artoo.
  • package.json: the npm project definition, registering dependencies for your project.
var gulp = require('gulp'),
clipboard = require('gulp-clipboard'),
uglify = require('gulp-uglify'),
rename = require('gulp-rename'),
artoo = require('gulp-artoo');
gulp.task('default', function() {
return gulp.src('./index.js')
;(function($, undefined) {
// Specifications to scrape one page's posts
var scraper = {
// We iterate on Hacker News posts
iterator: 'tr tr:has(td.title:has(a)):not(:last)',
// The following object represent the data we want to retrieve.
// The scrape method, as a lot of artoo's methods, is really polymorphic
// and the same thing may be expressed in a great variety of ways.
// Just use the way that fit your coding style the most.
data: {
// For the title, a simple subselector suffice (the text of the element is taken by default)
title: {sel: '.title a'},
// Same for the url, except that we request the 'href' attribute
url: {sel: '.title a', attr: 'href'},
// Following are more tricky as we need to process data a little bit
domain: {
// The sel parameter here is the same as $(currentIteratedEl).find('.comhead')
sel: '.comhead',
method: function($) {
// $(this) is therefore $(currentIteratedEl).find('.comhead')
// artoo follows jQuery paradigm whenever he can
return $(this).text().trim().replace(/[\(\)]/g, '');
// But if you prefer to use a function, right away, help yourself
score: function($) {
return +$(this).find('+ tr [id^=score]').text().replace(' points', '');
// Note that the 'method' function takes artoo's jquery reference as argument.
// This is made so you can access your desired version of jQuery without having to force it
// to the global scope.
user: {
sel: '+ tr a[href^=user]',
method: function($) {
return $(this).length ? $(this).text() : null;
nb_comments: {
sel: '+ tr a[href^=item]',
method: function($) {
var nb = +$(this).text().replace(' comments', '');
return isNaN(nb) ? 0 : nb;
// Fonction to retrieve next page's url
function nextUrl($page) {
return $page.find('td.title:last > a').attr('href');
// We start the scraper and scrape the first page so we don't need to
// get by ajax what we already have
artoo.log.debug('Starting the scraper...');
var frontpage = artoo.scrape(scraper);
// Then we launch the ajax spider
// This function is an iterator that returns the next page url
// It stops the spider if it returns false, else you'll need a limit param
function(i, $data) {
return nextUrl(!i ? $(document) : $data);
// This is a configuration object passed to the spider
// We only want to fetch two more pages, to total three with the first one.
limit: 2,
// We want to scrape the HTML retrieved by ajax
scrape: scraper,
// We want to concat new elements in the spider's accumulator so we have
// a flat list at the end
concat: true,
// This is the final callback of the spider
// We tell the user that the wait is over and we download the data
done: function(data) {
artoo.log.debug('Finished retrieving data. Downloading...');
{filename: 'hacker_news.json'}
}).call(this, artoo.$);
"name": "hackernews-scraper",
"version": "0.1.0",
"description": "A little artoo.js bookmarklet to scrape and download the first three pages of the famous Hacker News.",
"main": "index.js",
"author": "Yomguithereal",
"license": "MIT",
"dependencies": {
"gulp": "~3.8.7",
"gulp-uglify": "~0.3.1",
"gulp-artoo": "0.0.1",
"gulp-clipboard": "~0.1.1",
"gulp-rename": "~1.2.0"

Assuming you have installed gulp, not grunt? (error in

Hi there,

What if I want to use webpack instead?


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment