Skip to content

Instantly share code, notes, and snippets.

#!/usr/bin/env python
import sys
import os
import datetime
import time
import pysrt
from internetarchive import get_item
@jjjake
jjjake / convert_xls_to_utf8_csv.py
Last active March 19, 2022 15:54
This script converts a Microsoft Excel spreadsheet to a UTF-8 CSV file.
#!/usr/bin/env python
"""Convert a Microsoft Excel spreadsheet to a UTF-8 csv.
Usage:
# Make sure requrirements are installed.
$ sudo pip install xlrd backports.csv
# Run script.
$ python convert_xls_to_utf8_csv.py <spreadsheet>
@jjjake
jjjake / audit_gb_shipment.py
Last active March 16, 2017 23:15
Audit GB Shipment
import json
def get_gb_counts(tsv):
counts = dict()
for line in open(tsv):
barcode = line.split('\t')[0].lower()
# Skip header row.
if barcode == 'barcode':
continue
@jjjake
jjjake / iamine.go
Last active May 11, 2016 01:04
iamine in golang
package main
import (
"sync"
"bytes"
"time"
"bufio"
"fmt"
"github.com/sethgrid/pester"
"os"
#!/usr/bin/env python
"""Parse audfprint .out files.
example input:
Fri Jan 8 00:07:47 2016 Reading hash table /1/2015/db-dem3/dem3-debate-aa.db
NOMATCH precomp/1/2015/mp3s/ALJAZAM_20151219_000000_News.afpt 3659.9 sec 299066 raw hashes
Matched 2.9 s starting at 35.1 s in precomp/1/2015/mp3s/ALJAZAM_20151220_040000_Weekend_News.afpt to time 0.8 s in /1/2015/dem3-mp4/2015-12-19-D-Debate-0050.mp4 with 76 of 1264 common hashes at rank 5
"""
@jjjake
jjjake / ia-mine.go
Created October 29, 2015 18:01
An Archive.org metadata miner written in Go.
package main
import (
"os"
"bufio"
"crypto/tls"
"net/http"
"io/ioutil"
"fmt"
"time"

Downloading and Syncing Archive.org Collections

Following are instructions on how to use the Internet Archive command-line tool, "ia", to download a collection from Archive.org and keep it synced. The only requirements are that you have Python 2 installed on a Unix-like operating system (i.e. Mac OS X, Linux).

Downloading and Configuring the Ia Command-Line Tool

  1. Download the latest binary of the ia command-line tool:

The Internet Archive

Bits in Bits Out

The users and contributors of the Internet Archive are what makes Archive.org what it is today. Without contributions from our users, we would have nothing, and without users accessing our digital materials it would mean nothing.

This document will give a brief overview on how to get data into, and out of, Archive.org.

#!/bin/bash
function get_identifier() {
# Get ia identifier using youtube json file.
# Format is title-id, with title being limited to 70 chars.
#stitle=$(cat "$json_file" | jq -r '.title' | tr -cd '[[:alnum:]]_-' | cut -c 1-80)
stitle=$(cat "$json_file" | jq -r '.title' | gsed -e 's/^./\U&/g; s/ ./\U&/g' | tr -cd '[[:alnum:]]_-' | cut -c 1-80)
ytid=$(cat "$json_file" | jq -r '.id')