Skip to content

Instantly share code, notes, and snippets.

@anarchivist
anarchivist / marc2xml-at.py
Created December 31, 2009 04:36
create individual MARCXML files for archivists toolkit from a MARC21 file
"""creates individual MARCXML files for archivists toolkit from a MARC21 file"""
import pymarc
import os
import sys
header = u"""<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="MARC21slim2HTML.xsl" ?>
<collection xmlns="http://www.loc.gov/MARC21/slim"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
@atomotic
atomotic / h3-new-job
Last active September 23, 2015 18:58 — forked from anonymous/h3-new-job
#!/bin/bash
. heritrix.conf
if [ -z "$1" ] || [ -z "$2" ]; then
echo usage: $0 jobname seedsfile
exit
fi
JOB=$1
anonymous
anonymous / stream.js
Created September 26, 2012 17:55
/**
* To run this fill in the config, and `npm install ntwitter`
*/
var twitter = require('ntwitter');
var config = {
"access_token_key": "",
"access_token_secret": "",
"consumer_secret": "",

Some Friendly Advice for Data Curators

From Dorothea Salo at the Digital Humanities Winter Institute Data Curation Seminar ... shamelessly quoted by a student (Ed Summers).

  • Pick Software Last
  • Don't Chase the Shiny
  • Know Where the Exits Are (especially in the Cloud)
  • Keep Your Options Open
@cobyism
cobyism / gh-pages-deploy.md
Last active June 12, 2024 20:14
Deploy to `gh-pages` from a `dist` folder on the master branch. Useful for use with [yeoman](http://yeoman.io).

Deploying a subfolder to GitHub Pages

Sometimes you want to have a subdirectory on the master branch be the root directory of a repository’s gh-pages branch. This is useful for things like sites developed with Yeoman, or if you have a Jekyll site contained in the master branch alongside the rest of your code.

For the sake of this example, let’s pretend the subfolder containing your site is named dist.

Step 1

Remove the dist directory from the project’s .gitignore file (it’s ignored by default by Yeoman).

<?xml version='1.0' encoding='utf-8'?>
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<!--
Author: Rod Page
Source: http://iphylo.blogspot.com/2011/07/correcting-ocr-using-hocr-firefox.html#comment-400434491
-->
<xsl:output method='html' version='1.0' encoding='utf-8' indent='yes'/>
<xsl:variable name="scale" select="800 div //page/@width" />

Displaying images in the terminal with tput and echo

output

Requires ImageMagick, easily available from your favorite package manager. Tested on Linux and OSX
convert image.png -resize 40 txt:-|sed -E 's/://;s/\( ? ?//;s/, ? ?/,/g;s/\)//;s/([0-9]+,[0-9]+,[0-9]+),[0-9]+/\1/g;s/255/254/g;/mage/d'|awk '{print $1,$2}'|sed -E 's/^0,[0-9]+ /print "echo;tput setaf "\;/;s/^[0-9]+,[0-9]+ /print "tput setaf ";/;s/(.+),(.+),(.+)/\1\/42.5*36+\2\/42.5*6+\3\/42.5+16/'|bc|sed 's/$/;echo -n "  ";/'|tr '\n' ' '|sed 's/^/tput rev;/;s/; /;/g;s/$/tput sgr0;echo/'|bash
@atomotic
atomotic / Readme.md
Last active September 9, 2022 09:39
Internet Archive Save Page Now
@ruebot
ruebot / gccaedits IP ranges
Last active June 3, 2020 04:44
These are the IP ranges used for @gccaedits. These IP address come from here: https://en.wikipedia.org/wiki/Wikipedia:Blocking_IP_addresses. If you know of more that should be added, please contact @ruebot, and/or comment here.
"ranges": {
"Government of Canada": [
["192.139.201.0", "192.139.201.255"],
["192.139.202.0", "192.139.202.255"],
["192.139.203.0", "192.139.203.255"],
["192.139.204.0", "192.139.204.255"],
["192.197.77.0", "192.197.77.255"],
["192.197.78.0", "192.197.78.255"],
["192.197.80.0", "192.197.80.255"],
["192.197.84.0", "192.197.84.255"],