Skip to content

Instantly share code, notes, and snippets.

View ananich's full-sized avatar

Anthony Ananich ananich

View GitHub Profile
@mrflip
mrflip / datasets.md
Created August 9, 2012 20:01
Overview of Datasets

== Overview of Datasets ==

The examples in this book use the "Chimpmark" datasets: a set of freely-redistributable datasets, converted to simple standard formats, with traceable provenance and documented schema. They are the same datasets as used in the upcoming Chimpmark Challenge big-data benchmark. The datasets are:

  • Wikipedia English-language Article Corpus (wikipedia_corpus; 38 GB, 619 million records, 4 billion tokens): the full text of every English-language wikipedia article, in

  • Wikipedia Pagelink Graph (wikipedia_pagelinks; ) --

  • Wikipedia Pageview Stats (wikipedia_pageviews; 2.3 TB, about 250 billion records (FIXME: verify num records)) -- hour-by-hour pageview

@chilts
chilts / alexa.js
Created October 30, 2013 09:27
Getting the Alexa top 1 million sites directly from the server, unzipping it, parsing the csv and getting each line as an array.
var request = require('request');
var unzip = require('unzip');
var csv2 = require('csv2');
request.get('http://s3.amazonaws.com/alexa-static/top-1m.csv.zip')
.pipe(unzip.Parse())
.on('entry', function (entry) {
entry.pipe(csv2()).on('data', console.log);
})
;
@nickcernis
nickcernis / functions.php
Last active September 6, 2020 06:34
Genesis Simple Share Shortcode
<?php
// Adds a [social-icons] shortcode to output Genesis Simple Share icons in posts
// https://wordpress.org/plugins/genesis-simple-share/
// Add the code below to your active theme's functions.php file,
// or use in a site-specific plugin.
// The shortcode takes no attributes; change your icon settings via Genesis → Simple Share.
add_shortcode( 'social-icons', 'gss_shortcode' );