Skip to content

Instantly share code, notes, and snippets.

View infectious's full-sized avatar

Kepler EMEA infectious

View GitHub Profile
# Load 30 days of ip files into RDW
parameter :today do
2.day.ago.to_date
end
execute do
files = NOP(:ARC).join("appnexus/**/ips_*.txt.gz")
sources = files.reject do |file|
require 'ipaddr'
parameter :today do
2.day.ago.to_date
end
helper :new_ips do
NOP(:RDW).from(:"logs__#{today.ymd}").select_map { distinct(:ip) }
end
require 'ipaddr'
parameter :today do
2.day.ago.to_date
end
helper :logs_union do
((today - 30.days)..today).map do |date|
"""
SELECT ip
local ad = tonumber(ARGV[1])
local BL = 1
local weight = 0
-- bid price constant will be 50p : 500000 /1000 = 500
-- give up bid price constant will be 10p : 100000/1000 = 100
local give_up_bid = 100
-- Check if there is a weight
if tonumber(redis.call('zscore', 'ad:weights', ad)) == nil then
return nil
end
local XB = 100000 -- give up bid threshold = 10p
local BL = 1
local ad = tonumber(ARGV[1])
local weight = 0
-- Check if there is a weight
if tonumber(redis.call('zscore', 'ad:weights', ad)) == nil then
return nil
end
-- This is a PSEUDO code for calculating the Bid price according to events e.g clicks/conversion or what IM deem an event (except impressions/auctions)
-- This code has not been TESTED and should be a guide line
-- Using this code for dev or prod is at the risk of however is using/deploying this :)
-- Ant!
--- Equation in the documentation
--Bde = 1000 x Event_rate x deal_goal x lookback_window_price
-- This is a PSEUDO code for calculating the Bid price according to events e.g clicks/conversion or what IM deem an event (except impressions/auctions)
-- This code has not been TESTED and should be a guide line
-- Using this code for dev or prod is at the risk of however is using/deploying this :)
-- Ant!
--- Equation in the documentation
--Bde = 1000 x Event_rate x deal_goal x lookback_window_price
ETL question:
how do I transform an extract to add multiple output rows for each input row? For example, in the following there in an array within each row extracted and I want each array member to contribute to a separate row.
eg, Here is one one row of an extract. The array called 'splits' within it has two members (each a hash).
{:name=>"segment_feed", :hour=>"2013_02_05_19", :timestamp=>"20130205204446",
:splits=>[{"part"=>"0", "status"=>"new", "checksum"=>"3980ec0b30f78e15782df5dc29ec89e4"},
{"part"=>"1", "status"=>"new", "checksum"=>"fec249e666448b236ea6a4367563ccd6"}]}
I want the following two rows in the load (as many rows as there are split parts):
Problem:
Drastically fewer segment rows since 22nd May.
Some facts:
1. The raw Appnexus Segment files we download from AppNexus data siphon are still about the same size as before.
2. The extracted Segments files are now much smaller (1/1000th of former size).
3. Running the extract on older files reduces the extracted file size.
=> The extract is causing the problem.
(https://github.com/infectious/etl/blob/master/etl/apn/data_siphon/segments/extract.rb)
Sample of 100 rows extraced when the select condition is absent, but are excluded when condition is applied.
(taken from the difference between tables loaded with and without the condition)
Yet all segments have active=1, obsolete=0...
ie these are the ones being excluded, but shouldn't be.
Hour: 2013_05_24_00
period user_id segment_id
380376 4712986594389400462 444291
380376 7074347584413598423 380725