Skip to content

Instantly share code, notes, and snippets.

View infectious's full-sized avatar

Kepler EMEA infectious

View GitHub Profile
@infectious
infectious / gist:7657599
Last active December 29, 2015 10:29
Example flash code
<script>(function()
{
var flashAd='<OBJECT id="0" data="http://cdn.adnxs.com/p/22/8d/1e/99/228d1e996f864b393b49976c5db81a12.swf" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" WIDTH="300" HEIGHT="250" flashvars="clickTAG=http%3A%2F%2Fnym1.ib.adnxs.com%2Fclick%3FAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA___________NmJRSAAAAAAEAAAAAAAAAAAAAAAAAAABiuI4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgUAAAAAAAEA9QsTUwAAAAA.%2Fclickenc%3Dhttp%253A%252F%252Fwww.johnlewis.com%253Ftmad%253Dc%2526tmcampid%253D77%2526s_dsuid%253D%2526s_dscid%253DDSINF_P9_OUTERO_300x250_C_%2524%257BCP_ID%257D&clickTAG1=http%3A%2F%2Fnym1.ib.adnxs.com%2Fclick%3FAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA___________NmJRSAAAAAAEAAAAAAAAAAAAAAAAAAABiuI4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAgUAAAAAAAEA9QsTUwAAAAA.%2Fclickenc%3Dhttp%253A%252F%252Fwww.johnlewis.com%252Fmen%252Fmen%255C%2527s-jackets-coats%252Fc600001512%253Ftmad%253Dc%2526tmcampid%253D77%2526s_dsuid%253D%2526s_dscid%253DDSINF_P9_OUTERO_3
@infectious
infectious / ISOToUnix
Created August 14, 2013 13:37
Piggybank ISOToUnix
register /home/hadoop/.versions/pig-0.11.1.1/lib/pig/piggybank-0.11.1.1-amzn.jar;
define ISOToUnix org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix();
data = load 'data_file.txt' using PigStorage('\t') as (user_id:int, datetime:chararray);
data = limit data 10000;
pdata = foreach generate
user_id,
## Question: I want to use this same ETL for a few different dimensions. How can I use a parameter (cols) to employ the 'columns' option in the load block (line 28/29 below)?
parameter :today do
8.day.ago.to_date
end
parameter :dimension do
"placement"
end
Sample of 100 rows extraced when the select condition is absent, but are excluded when condition is applied.
(taken from the difference between tables loaded with and without the condition)
Yet all segments have active=1, obsolete=0...
ie these are the ones being excluded, but shouldn't be.
Hour: 2013_05_24_00
period user_id segment_id
380376 4712986594389400462 444291
380376 7074347584413598423 380725
Problem:
Drastically fewer segment rows since 22nd May.
Some facts:
1. The raw Appnexus Segment files we download from AppNexus data siphon are still about the same size as before.
2. The extracted Segments files are now much smaller (1/1000th of former size).
3. Running the extract on older files reduces the extracted file size.
=> The extract is causing the problem.
(https://github.com/infectious/etl/blob/master/etl/apn/data_siphon/segments/extract.rb)
# Fetch a list of siphons for download from AppNexus
extract :DWAPI do
get 'Siphon'
limit 2
resolve do |response|
rows = []
response.each do |input|
name = input[:name]
ETL question:
how do I transform an extract to add multiple output rows for each input row? For example, in the following there in an array within each row extracted and I want each array member to contribute to a separate row.
eg, Here is one one row of an extract. The array called 'splits' within it has two members (each a hash).
{:name=>"segment_feed", :hour=>"2013_02_05_19", :timestamp=>"20130205204446",
:splits=>[{"part"=>"0", "status"=>"new", "checksum"=>"3980ec0b30f78e15782df5dc29ec89e4"},
{"part"=>"1", "status"=>"new", "checksum"=>"fec249e666448b236ea6a4367563ccd6"}]}
I want the following two rows in the load (as many rows as there are split parts):
-- This is a PSEUDO code for calculating the Bid price according to events e.g clicks/conversion or what IM deem an event (except impressions/auctions)
-- This code has not been TESTED and should be a guide line
-- Using this code for dev or prod is at the risk of however is using/deploying this :)
-- Ant!
--- Equation in the documentation
--Bde = 1000 x Event_rate x deal_goal x lookback_window_price
-- This is a PSEUDO code for calculating the Bid price according to events e.g clicks/conversion or what IM deem an event (except impressions/auctions)
-- This code has not been TESTED and should be a guide line
-- Using this code for dev or prod is at the risk of however is using/deploying this :)
-- Ant!
--- Equation in the documentation
--Bde = 1000 x Event_rate x deal_goal x lookback_window_price
local XB = 100000 -- give up bid threshold = 10p
local BL = 1
local ad = tonumber(ARGV[1])
local weight = 0
-- Check if there is a weight
if tonumber(redis.call('zscore', 'ad:weights', ad)) == nil then
return nil
end