Skip to content

Instantly share code, notes, and snippets.

View common_crawl_hostname_count.rb
#!/usr/bin/env ruby
# a quick, simple script to partially parse output from https://github.com/trivio/common_crawl_index/blob/master/bin/remote_read
# and output subdomains in order of count
url_counts = {}
total_urls = 0
File.readlines(ARGV[0]).each do |line|
url = line.split(' ').first
reverse_hostname = url.split('/').first
@jronallo
jronallo / NC-HB2-ids.sh
Created Feb 16, 2017 — forked from cazzerson/NC-HB2-ids.sh
Extracting HB2 Tweet IDs from multiple twarc datasets
View NC-HB2-ids.sh
# This script requires the jq utility
# https://stedolan.github.io/jq/
# Datasets created with twarc
# https://github.com/DocNow/twarc
mkdir -p NCHB2-ids
rm NCHB2-ids/NCHB2*
touch NCHB2-ids/NCHB2-ids-with-dupes.txt
# Create more relevant subset of "North Carlina" search
View dabblet.css
/**
* "Google Now" Card
*/
body {
background: #e1e1e1;
min-height: 100%;
margin: auto;
}
ul.gNow {
width: 450px;
@jronallo
jronallo / dzslides2pdf.rb
Created Jul 18, 2013
Ruby script that uses capybara-webkit and imagemagick (convert actually) to turn a dzslides HTML slideshow into a PDF.
View dzslides2pdf.rb
#! /usr/bin/env ruby
# dzslides2pdf.rb
# dzslides2pdf.rb http://localhost/presentation_root presentation.html
require 'capybara/dsl'
require 'capybara-webkit'
# require 'capybara/poltergeist'
require 'fileutils'
include Capybara::DSL
@jronallo
jronallo / item.json
Created Mar 26, 2013
elasticsearch example document
View item.json
{
"type": [
"http:\/\/schema.org\/Organization"
],
"properties": {
"name": [
"Riverdale"
],
"url": [
"http:\/\/d.lib.ncsu.edu\/collections\/catalog?f%5Bnames_facet%5D%5B%5D=Riverdale"
@jronallo
jronallo / tesse
Created Mar 13, 2013
A toy command line utility for OCRing and cleaning OCR output.
View tesse
#!/usr/bin/env ruby
# tesse: commandline tool for looking at tesseract OCR and cleaning the output
# Besides the following gem requirements it requires the following Linux programs:
# eog: for viewing the images
# wmctrl: for resizing and positioning the image viewing window
require 'tesseract'
require 'ffi/aspell'
@jronallo
jronallo / get_and_process_webdatacommons_data.sh
Last active Dec 11, 2015
scripts for outputting some reports from the Web Data Commons NQuads
View get_and_process_webdatacommons_data.sh
#!/usr/bin/env bash
# These steps will take a long time to download the data set.
# First, get the list of available NQuad files to download.
wget http://webdatacommons.org/2012-08/stats/files.list
# We're only interested in the microdata set right now since that seems to be where schema.org/Book is used more. So create a file list
cat files.list | grep html-microdata > microdata_files.list
# OK, this will take a while depending on your connection. Let it run overnight.
wget -i microdata_files.list
@jronallo
jronallo / code4lib-vote
Created Nov 16, 2015
Quick Ruby script to get libnotify desktop notifications of the current vote tally of your talk
View code4lib-vote
#!/usr/bin/env ruby
# To add this to cron do something like this to use the ruby wrapper script:
# */15 8-17 * * 1-5 env DISPLAY=:0.0 /home/jnronall/.rvm/wrappers/ruby-2.1.1/ruby /home/jnronall/bin/code4lib-vote > $HOME/tmp/code4lib-vote-cron.log 2>&1
require 'httpclient'
require 'json'
require 'date'
require 'libnotify'
require 'slop'
View rdfa_prototypes.html
<!DOCTYPE html>
<html>
<head>
<base href="http://d.lib.ncsu.edu/collections/catalog/mc00096-001-ff0155-000-001_0001" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="width=device-width; initial-scale=1.0">
@jronallo
jronallo / markdown.xml
Created Dec 2, 2012 — forked from lg0/markdown.xml
Markdown Syntax Highlighting for Sublime text 2
View markdown.xml
<!-- copy this to YOUR_THEME.tmTheme-->
<dict>
<key>name</key>
<string>diff: deleted</string>
<key>scope</key>
<string>markup.deleted</string>
<key>settings</key>
<dict>
<key>background</key>
<string>#EAE3CA</string>
You can’t perform that action at this time.