Skip to content

Instantly share code, notes, and snippets.

View bbengfort's full-sized avatar
🎯
Focusing

Benjamin Bengfort bbengfort

🎯
Focusing
View GitHub Profile
@bbengfort
bbengfort / .gitconfig
Last active August 29, 2015 13:56
My system level git configuration.
[core]
editor = mvim -f
ui = true
excludesfile = /Users/benjamin/.gitignore
[user]
name = Benjamin Bengfort
email = benjamin@bengfort.com
[diff]
external = /Users/benjamin/bin/godiff.sh
[credential]
@bbengfort
bbengfort / maxbound.py
Last active August 29, 2015 13:56
There exists some number in [1, infinity) whose value, if gone over will cause failure. E.g. the number of concurrent users that will bring down a server. To discover this number- use exponential bounding to quickly find the highest number, then use a halving approach to find the correct number.
#!/usr/bin/env python
import time
def search_steps(Z, factor=10, verbose=True):
"""
Finds the number you're looking for.
"""
# Initialize internal variables
@bbengfort
bbengfort / postgres_bakup.sh
Created February 19, 2014 19:19
Backup to S3. Usage: `$ postgres_bakup.sh YYYYMMDD.backup.sql.gz`
#!/bin/bash
USER=$PGUSER
HOST=$PGHOST
NAME=$PGDATABASE
BUCKET="s3://cobrainabsolem/datadumps/productdb/"
main() {
if [ "$#" == "0" ]; then
echo "Please supply a filename for the dump"
@bbengfort
bbengfort / baby_search.py
Last active August 29, 2015 13:56
PSQL queries with psycopg2 to search for incorrect age-labeled data in the CPS. Compares the total matches for the pattern with the total number incorrectly marked as ADULT.
#!/usr/bin/env python
# baby_search
# Executes regular expression queries on CPS
#
# Author: Benjamin Bengfort <benjamin@bengfort.com>
# Created: Mon Mar 03 09:56:09 2014 -0500
# Requires: psycopg2
#
# ID: baby_search.py [] benjamin@bengfort.com $
@bbengfort
bbengfort / buckets.sql
Created March 31, 2014 20:13
Compute the number of items in different probability buckets.
select
round(100.0 * (SUM(CASE WHEN probability > 0.95 THEN 1 ELSE 0 END) / count(id)::float)::numeric, 3) as pcnt_95,
round(100.0 * (SUM(CASE WHEN probability > 0.90 THEN 1 ELSE 0 END) / count(id)::float)::numeric, 3) as pcnt_90,
round(100.0 * (SUM(CASE WHEN probability > 0.85 THEN 1 ELSE 0 END) / count(id)::float)::numeric, 3) as pcnt_85,
round(100.0 * (SUM(CASE WHEN probability > 0.80 THEN 1 ELSE 0 END) / count(id)::float)::numeric, 3) as pcnt_80,
round(100.0 * (SUM(CASE WHEN probability > 0.75 THEN 1 ELSE 0 END) / count(id)::float)::numeric, 3) as pcnt_75,
round(100.0 * (SUM(CASE WHEN probability > 0.70 THEN 1 ELSE 0 END) / count(id)::float)::numeric, 3) as pcnt_70,
round(100.0 * (SUM(CASE WHEN probability > 0.65 THEN 1 ELSE 0 END) / count(id)::float)::numeric, 3) as pcnt_65,
round(100.0 * (SUM(CASE WHEN probability > 0.60 THEN 1 ELSE 0 END) / count(id)::float)::numeric, 3) as pcnt_60
from annotated_products;
@bbengfort
bbengfort / cloze_analyze.py
Last active August 29, 2015 14:00
Cloze Analysis using NLTK
#!/usr/bin/env python
import os
import sys
import nltk
import argparse
import unicodecsv as csv
from operator import itemgetter
PATH = os.path.normpath(os.path.join(os.path.dirname(__file__), 'cloze_output.txt'))
@bbengfort
bbengfort / calories.csv
Last active August 29, 2015 14:00
Simple (incorrect) parsing of a csv file.
Food Measure Weight (g) kCal Fat (g) Carbo(g) Protein (g)
Source:USDA Nutrient Database for Standard Reference Release 12 Pocket v1.1 www.nal.usda.gov/fnic/foodcomp
1000 Island,Salad Drsng,Local 1 Tbsp 15 25 2 2 0
1000 Island,Salad Drsng,Reglr 1 Tbsp 16 60 6 2 0
40% Bran Flakes,Kellogg's 1 oz 28.35 90 1 22 4
40% Bran Flakes,Post 1 oz 28.35 90 0 22 3
Alfalfa Seeds,Sprouted,Raw 1 Cup 33 10 0 1 1
All-Bran Cereal 1 oz 28.35 70 1 21 4
Almonds,Slivered 1 Cup 135 795 70 28 27
Almonds,Whole 1 oz 28.35 165 15 6 6
@bbengfort
bbengfort / cpsexport.sh
Created May 21, 2014 22:29
mongo export
mongoexport --db cps --collection products --journal | gzip -9 > fixtures/products.json.gz
@bbengfort
bbengfort / redfox.sql
Last active August 29, 2015 14:01
CPS to Redfox Query
SELECT product.id, product.name, product.description,
coalesce(product.image, product.affiliate_image, product.thumbnail, product.affiliate_thumbnail) as image,
product.gender, product.age, category.name as original_category,
'' as label, '' as probability, '' as sublabel, '' as subprobability,
product.created, product.updated
FROM product
LEFT JOIN category on category.id = product.category_id
--LIMIT 10
;
@bbengfort
bbengfort / storemongo
Created June 17, 2014 18:12
Backup mongo to s3
#!/bin/bash
# Dumps the Mongo Database and exports it to S3
# Make sure to run this script in the background! (nohup)
WORKING_DIRECTORY="/mnt/vol-604be929/tmp/data"
if [ ! -d "$WORKING_DIRECTORY" ]; then
mkdir -p $WORKING_DIRECTORY
fi