Skip to content

Instantly share code, notes, and snippets.

View erochest's full-sized avatar

Eric Rochester erochest

View GitHub Profile
# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ubuntu-13.10"
config.vm.box_url = "http://cloud-images.ubuntu.com/vagrant/saucy/current/saucy-server-cloudimg-amd64-vagrant-disk1.box"
end
@erochest
erochest / Vagrantfile
Created October 21, 2013 20:59
Bare-bones vagrant files for using Ubuntu 13.10 (Saucy Server) as a basebox.
# vi: set ft=ruby :
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ubuntu-13.10"
config.vm.box_url = "http://cloud-images.ubuntu.com/vagrant/saucy/current/saucy-server-cloudimg-amd64-vagrant-disk1.box"
end
@erochest
erochest / xml_to_corpus.py
Last active December 25, 2015 02:29
Pull the text from Perseus TEI.
#!/usr/bin/env python
import codecs
import os
import lxml.etree as ET
## CHANGE THIS:
#!/usr/bin/env python
import codecs
import os
import lxml.etree as ET
## CHANGE THIS:

Notes

This contains the files I used to perform the timings, as well as the timings themselves.

The timings are to process one bag with 60,000 small files and one bag with one large (10GB) file. Scripts related to the bag with many files are named like *-lots, and scripts related to the bag with one large file are named like *-large.

What I'm Timing

Ruby

<address class="vcard" vocab="http://www.w3.org/2006/vcard/ns#" resource="http://scholarslab.org/" typeof="Organization">
<span class="org fn">
<a class="url organization-name" href="http://scholarslab.org/">
<span property="formattedName">Scholars’ Lab</span>
</a>
<a class="organization-unit extended-address" href="http://lib.virginia.edu/" property="hasOrganizationName" resource="http://lib.virginia.edu/" typeof="Organization">
<span property="formattedName">University of Virginia Library</span>
</a>
</span>
<span property="hasAddress" typeof="Work">
@erochest
erochest / gist:5853420
Last active December 18, 2015 22:19
Ruby snippets
[1, 1, 2, 3, 5].each { |x| x * 2 }
@erochest
erochest / base.pp
Last active December 17, 2015 14:08
Some puppet files for setting up my personal config under a Vagrant-managed VM.
# A palate cleanser for apt.
exec { 'apt-get update':
path => ['/usr/bin'],
}
## These two use this module: https://github.com/erochest/puppet-omeka
## Use this to automate getting that set up: https://github.com/erochest/omeka-vm
class { 'omeka':
@erochest
erochest / lein.bat
Created February 19, 2013 16:32
A current Leiningen batch file script for Windows that downloads Leiningen 2.0.0.
@echo off
setLocal EnableExtensions EnableDelayedExpansion
set LEIN_VERSION=2.0.0
if "%LEIN_VERSION:~-9%" == "-SNAPSHOT" (
set SNAPSHOT=YES
) else (
set SNAPSHOT=NO
(function(window, $, undefined) {
var $window = $(window);
/**
* Show or hide the button depending on the scroll position.
*/
function animateButton() {
var button = $('#back-to-top');
var scrollPosition = $window.scrollTop();
if (scrollPosition > 400) {