Skip to content

Instantly share code, notes, and snippets.

View ryanfb's full-sized avatar

Ryan Baumann ryanfb

View GitHub Profile
<?xml version='1.0' encoding='utf-8'?>
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<!--
Author: Rod Page
Source: http://iphylo.blogspot.com/2011/07/correcting-ocr-using-hocr-firefox.html#comment-400434491
-->
<xsl:output method='html' version='1.0' encoding='utf-8' indent='yes'/>
<xsl:variable name="scale" select="800 div //page/@width" />
@ryanfb
ryanfb / README.md
Last active December 31, 2015 21:29
Lace hOCR + PDF recombination

Lace hOCR + PDF recombination

Use the lace branch of my fork of HocrConverter: https://github.com/ryanfb/HocrConverter/tree/lace (make sure you git pull to get the latest changes)

Download and compile jbig2enc in your script path. Modify pdf.py to use 300 instead of 72 dpi.

Example run:

./lace2pdf.sh xenophon04xeno

@acairns
acairns / publish
Created January 5, 2014 11:58
Bash script to publish Jekyll post from _drafts into _posts
#!/bin/sh
if [ -z "$1" ]
then
echo "No draft file found"
exit
fi
mv $1 _posts/`date +"%Y-%m-%d"`-`basename $1`
@allanmac
allanmac / sha256.cu
Last active November 10, 2023 01:26
A CUDA SHA-256 subroutine using macro expansion
// -*- compile-command: "nvcc -m 32 -arch sm_35 -Xptxas=-v,-abi=no -cubin sha256.cu"; -*-
//
// Copyright 2013 Allan MacKinnon <allanmac@alum.mit.edu>
//
// Permission is hereby granted, free of charge, to any person obtaining
// a copy of this software and associated documentation files (the
// "Software"), to deal in the Software without restriction, including
// without limitation the rights to use, copy, modify, merge, publish,
// distribute, sublicense, and/or sell copies of the Software, and to
@acdha
acdha / ocr-file.py
Created March 17, 2014 22:49
Fragment of code used to process images with Tesseract OCR
def ocr_file(filename, languages, output_base, temp_dir):
log.info("Launching tesseract on %s", filename)
output = subprocess.check_output(['tesseract', filename, output_base,
'-l', '+'.join(languages), TESSERACT_CONFIG],
cwd=temp_dir,
stderr=subprocess.STDOUT)
with OCR_STORAGE.open('%s/%s/%s.log' % (item_id, group, index), 'w') as log_f:
log_f.write(output)
@mcewand
mcewand / PennInHand_grabber.py
Last active August 29, 2015 14:02
Quick script to help students in the Rare Book School M-95 course.
#!/bin/python
# 1. Save this file to your local machine where you want to save the files
# 2. Change 'manuscript' to the one you want
# 3. Change 'lastItem' to the last numbered item of the text section of the manuscript
# 4. Add a directory 'jpgs' that the images will be stored in
# 5. Run "python PennInHand_grabber.py"
import sys
import urllib
anonymous
anonymous / bad-usenet-post-extractor.rb
Created September 3, 2014 03:48
Takes a directory full of disordered, misnamed, obfuscated multi-volume RAR files and extracts them. For those Usenet full TV season backlog posts that get picked up by Sick Beard and have millions of .0 .1 .2 files you can't extract and throw "There are no videofiles in folder" in the post-processor. See e.g. http://sickbeard.com/forums/viewtop…
#!/usr/bin/env ruby
require 'pty'
require 'expect'
$rar_divider = "-------------------------------------------------------------------------------\n"
def check_rar_contents(file)
$stderr.puts "Checking RAR contents for: #{file}"
rar_contents = `unrar l -v "#{file}" 2>&1`.split($rar_divider)
diff --git classify/trainingsampleset.cpp classify/trainingsampleset.cpp
index afbf3f4..6121395 100644
--- classify/trainingsampleset.cpp
+++ classify/trainingsampleset.cpp
@@ -693,6 +693,8 @@ void TrainingSampleSet::ComputeCanonicalSamples(const IntFeatureMap& map,
fcinfo.canonical_sample = fcinfo.samples[0];
fcinfo.canonical_dist = 0.0f;
for (int i = 0; i < fcinfo.samples.size(); ++i) {
+ #pragma omp parallel
+ {
# Endpoint http://vocab.getty.edu/sparql
select ?t ?name (count(*) as ?c) {
?x gvp:placeType ?t. ?t gvp:prefLabelGVP/xl:literalForm ?name
} group by ?t ?name
# 0. Inspired by https://twitter.com/paregorios/status/568513448130187264
# 1. This includes place types AND cultures/styles (eg "religious center" and "Maya")
# 2. Exploring the AAT hierarchy above these types could also be interesting
# 3. We also provide TGN counts as per Mar 2015
# 4. The query is a bit expensive (1.2M places, 2.7M type instances), so be nice and use the attached TSV
@bruce30262
bruce30262 / ARMDebianUbuntu.md
Last active June 12, 2023 11:43 — forked from Liryna/ARMDebianUbuntu.md
Emulating ARM on Debian/Ubuntu

You might want to read this to get an introduction to armel vs armhf.

If the below is too much, you can try Ubuntu-ARMv7-Qemu but note it contains non-free blobs.

Running ARM programs under linux (without starting QEMU VM!)

First, cross-compile user programs with GCC-ARM toolchain. Then install qemu-arm-static so that you can run ARM executables directly on linux

If there's no qemu-arm-static in the package list, install qemu-user-static instead