Skip to content

Instantly share code, notes, and snippets.

View gtfierro's full-sized avatar

Gabe Fierro gtfierro

View GitHub Profile
@gtfierro
gtfierro / csv_reader.py
Created September 11, 2013 17:29
Simplifies the process for reading in unicode CSV files
#!/usr/bin/env python
"""
Simplifies the process for reading in unicode CSV files
"""
import csv
from unicodedata import normalize
import codecs
@gtfierro
gtfierro / run_ocr.sh
Last active May 11, 2022 16:57
Quick shell script for parallel OCR on PDFs using ghostscript and tesseract
#!/bin/bash
# requires ghostscript (http://www.ghostscript.com/)
# requires ImageMagick
# requires tesseract (https://code.google.com/p/tesseract-ocr/)
# requires GNU parallel (https://www.gnu.org/software/parallel/)
# all of these are typically available through yum/apt/brew/etc.
# number of cores over which the process will be parallelized
num_cores=$1
#!/bin/sh
# Converts a mysqldump file into a Sqlite 3 compatible file. It also extracts the MySQL `KEY xxxxx` from the
# CREATE block and create them in separate commands _after_ all the INSERTs.
# Awk is choosen because it's fast and portable. You can use gawk, original awk or even the lightning fast mawk.
# The mysqldump file is traversed only once.
# Usage: $ ./mysql2sqlite mysqldump-opts db-name | sqlite3 database.sqlite
# Example: $ ./mysql2sqlite --no-data -u root -pMySecretPassWord myDbase | sqlite3 database.sqlite
@gtfierro
gtfierro / install_ec2_tesseract.sh
Last active January 3, 2016 22:49
Install/configure tesseract for EC2 instance
#!/bin/bash
# for Ubuntu 12.04/12.10
sudo apt-get update
sudo apt-get -y install autoconf automake make buildessential
sudo apt-get -y install tesseract-ocr tesseract-ocr-eng imagemagick
# install parallel
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
# convert PDF to TIFF
package main
import (
"crypto/sha1"
"fmt"
"io/ioutil"
"log"
"os"
"strconv"
"time"
@gtfierro
gtfierro / miner
Last active January 4, 2016 07:09
GITCOIN
#!/bin/bash
set -eu
if [ "$#" != 2 ]; then
echo >&2 "Usage: $0 <clone_url> <public_username>
A VERY SLOW mining implementation. This should give you an idea of
where to start, but it probably won't successfully mine you any
Gitcoins.
@gtfierro
gtfierro / build.sh
Last active January 4, 2016 13:19
stripe-ctf level 3
#!/bin/sh
set -e
# Add or modify any build steps you need here
cd "$(dirname "$0")"
pip install --user pandas flask requests grequests
@gtfierro
gtfierro / gist:8806226
Created February 4, 2014 15:49
Answers for Python, Week 2
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
#!/usr/bin/env python
"""
simple example script for running notebooks and reporting exceptions.
Usage: `checkipnb.py foo.ipynb [bar.ipynb [...]]`
Each cell is submitted to the kernel, and checked for errors.
"""
import os,sys,time
@gtfierro
gtfierro / gist:9220108
Created February 25, 2014 23:17
Download PTAB
import requests
from bs4 import BeautifulSoup
url = 'http://e-foia.uspto.gov/Foia/DispatchBPAIServlet?RetrieveRecent=30'
html = requests.get(url).content
soup = BeautifulSoup(html)
all_download_links = soup.findAll('a', {'target': '_self'})
for i, link in enumerate(all_download_links):
download = 'http://e-foia.uspto.gov/Foia/'+link['href']