Skip to content

Instantly share code, notes, and snippets.

@p5k6
p5k6 / scraper.rb
Last active August 29, 2015 13:57
nhl scraper... still pretty basic
require 'wombat'
Pry.config.pager = false
set1 = Wombat.crawl do base_url "http://www.nhl.com"; path '/ice/news.htm?id=675589';
to_team 'xpath=//*[@id="cmstable_7607"]/tbody[1]/tr/td[3]//img/@src', :list
to_players 'xpath=//*[@id="cmstable_7607"]/tbody[1]/tr/td[2]', :list
from_team 'xpath=//*[@id="cmstable_7607"]/tbody[1]/tr/td[5]//img/@src', :list
end
@p5k6
p5k6 / example_sql_test.sql
Created April 3, 2014 20:38
sql-style test cases
create table report1 as
select
person_id,
count(1) as purch_cnt
from
purchases
where
purchase_date > '2014-03-01'
group by
person_id
@p5k6
p5k6 / spawn.rb
Last active August 29, 2015 13:58
Using Ruby to spawn multiple processes, communicate with them and wait/join the threads
arr = []
10.times do |i|
cmd = "ping -c #{(rand(0)*10).to_i + 1} 127.0.0.1"
pid = spawn(cmd, {out: "/dev/null"})
puts "#{pid}\n"
arr[i] = Process.detach(pid)
end
arr.each { |t| t.join; print "#{t.status}: #{t.pid}" }
@p5k6
p5k6 / windows
Created January 9, 2015 17:30
I think this would work?
select * from (
SELECT shows.* , ROW_NUMBER() OVER(PARTITION BY e.show_id ORDER BY e.airs_on) AS r
FROM episodes e
join shows s on s.id=e.show_id
WHERE e.airs_on >= NOW()
) a
where r=1
Timeout::timeout(30) do
socket = TCPSocket.new @hive_metastore_server, @hive_metastore_port
begin
socket.write("hello")
socket.close_write
x = socket.read
#not sure when this would happen....
if x.nil?
@monitored_app_state = :unknown
@p5k6
p5k6 / .gitconfig
Created January 7, 2016 22:51
gitconfig for CLI
[user]
name = Josh Stanfield
email = p5k6@yahoo.com
[core]
editor = vim
[alias]
lol = log --graph --decorate --pretty=oneline --abbrev-commit --all
l10 = log --pretty=oneline --abbrev-commit --all --decorate --graph --max-count=10
add-modified = status | grep 'modified' | ruby -e "puts STDIN.read.gsub(/.*:/,'').gsub(/\n/,'')" | xargs git add
[color]

Keybase proof

I hereby claim:

  • I am p5k6 on github.
  • I am p5k6 (https://keybase.io/p5k6) on keybase.
  • I have a public key whose fingerprint is D844 1A65 94A4 665F CC35 BEA0 7DA7 7D09 8F4A 4E03

To claim this, I am signing this object:

"""
ETL step wrapper to extract data from mysql to S3, compressing it along the way
"""
import dataduct
from dataduct.config import Config
from dataduct.steps.etl_step import ETLStep
from dataduct.pipeline import CopyActivity
from dataduct.pipeline import MysqlNode
from dataduct.pipeline import PipelineObject
from dataduct.pipeline import Precondition
@p5k6
p5k6 / transform_with_precondition.py
Created May 19, 2016 15:54
custom step to allow a precondition on a transform step - see lines 75-82 and 139
"""
ETL step wrapper for shell command activity can be executed on Ec2 / EMR, with precondition
"""
from dataduct.pipeline import S3Node
from dataduct.pipeline import ShellCommandActivity
from dataduct.pipeline import Precondition
from dataduct.s3 import S3Directory
from dataduct.s3 import S3File
from dataduct.s3 import S3Path
from dataduct.utils import constants as const
name: qa-test-transform
frequency: daily
load_time: 08:00
description: |
testing out new transform step with precondition
steps:
- step_type: extract-s3
name: ops-input