Skip to content

Instantly share code, notes, and snippets.

@mdagost
mdagost / emr.py
Created September 13, 2012 22:48 — forked from josephwilk/emr.py
Wrapper around elastic-mapreduce to make it easier to use
#!/usr/bin/env python
EMR_COMMAND = os.path.expanduser('~/elastic-mapreduce/elastic-mapreduce')
EMR_LOGGING_DIR = "s3://songkick/emr-logs"
def create_pig_job_flow(pigscript, num_instances=1, extraArguments=[]):
jobname = "Pig_Daily_" + datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
print "Creating pig job flow", jobname, pigscript
args = [EMR_COMMAND,
"--create",
@mdagost
mdagost / The Technical Interview Cheat Sheet.md
Created September 25, 2015 15:20 — forked from tsiege/The Technical Interview Cheat Sheet.md
This is my technical interview cheat sheet. Feel free to fork it or do whatever you want with it. PLEASE let me know if there are any errors or if anything crucial is missing. I will add more links soon.

Studying for a Tech Interview Sucks, so Here's a Cheat Sheet to Help

This list is meant to be a both a quick guide and reference for further research into these topics. It's basically a summary of that comp sci course you never took or forgot about, so there's no way it can cover everything in depth. It also will be available as a gist on Github for everyone to edit and add to.

Data Structure Basics

###Array ####Definition:

  • Stores data elements based on an sequential, most commonly 0 based, index.
  • Based on tuples from set theory.
#!/usr/local/bin/Rscript
'usage: my_prog.R [-a -r -m <msg>]
options:
-a Add
-r Remote
-m <msg> Message' -> doc
library(docopt)
@chrishamant
chrishamant / s3_multipart_upload.py
Created January 3, 2012 19:29
Example of Parallelized Multipart upload using boto
#!/usr/bin/env python
"""Split large file into multiple pieces for upload to S3.
S3 only supports 5Gb files for uploading directly, so for larger CloudBioLinux
box images we need to use boto's multipart file support.
This parallelizes the task over available cores using multiprocessing.
Usage:
s3_multipart_upload.py <file_to_transfer> <bucket_name> [<s3_key_name>]
library(XML)
library(uuid)
library(stringr)
library(plyr)
library(reshape2)
library(ggplot2)
f <- "https://raw.githubusercontent.com/chris-taylor/USElection/master/data/electoral-college-votes.csv"
electoral.college <- read.csv(f, header=FALSE)
names(electoral.college) <- c("state", "electoral_votes")
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# See official docs at https://dash.plotly.com
# pip install dash pandas
from dash import Dash, dcc, html, Input, Output
import plotly.express as px
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv')
@barnes7td
barnes7td / sublime_setup.md
Last active March 28, 2024 17:59
Sublime Terminal Setup

Setup Terminal for Sublime Shorcut "subl":

Open terminal and type:

1. Create a directory at ~/bin:

mkdir ~/bin

2. Copy sublime executable to your ~/bin directory:

@jasonrudolph
jasonrudolph / 00-about-search-api-examples.md
Last active April 30, 2024 19:21
5 entertaining things you can find with the GitHub Search API
@rgreenjr
rgreenjr / postgres_queries_and_commands.sql
Last active May 14, 2024 11:33
Useful PostgreSQL Queries and Commands
-- show running queries (pre 9.2)
SELECT procpid, age(clock_timestamp(), query_start), usename, current_query
FROM pg_stat_activity
WHERE current_query != '<IDLE>' AND current_query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
-- show running queries (9.2)
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'