Skip to content

Instantly share code, notes, and snippets.

View alexanderdean's full-sized avatar

Alexander Dean alexanderdean

View GitHub Profile
/*
* Copyright (c) 2015 Tim Harper.
*/
import sbt._
import Keys._
import xerial.sbt.Pack._
object SamzaTasks {
---
# ^^^ YAML documents must begin with the document separator "---"
#
#### Example docblock, I like to put a descriptive comment at the top of my
#### playbooks.
#
# Overview: Playbook to bootstrap a new host for configuration management.
# Applies to: production
# Description:
# Ensures that a host is configured for management with Ansible.
@alexanderdean
alexanderdean / redshift-bug
Created December 17, 2013 17:13
Redshift bug when working with JSONs and UNIONs
-- 1. Setup
DROP table bug_table cascade;
CREATE TABLE bug_table (
some_json varchar(200),
some_flag boolean
);
CREATE VIEW bug_view_1 AS

Custom unstructured event and context functionality: draft specification

0. Introduction

This draft specification covers the enrichment and storage processes for:

  1. Custom unstructured events
  2. Custom untructured context

Custom unstructured events are well-documented as part of the Snowplow Tracker Protocol. Custom unstructured context is less well documented - essentially is looks like this:

@alexanderdean
alexanderdean / gist:7009360
Last active December 25, 2015 16:58
Bad data
fd9f13d4-e7f9-46b1-9f03-b39121de1aa2 2013-10-15 21:12:35.261 30 budweiser 17 10 300 250 f 0 0 0 f f f f f f f f http://delivery.sblk.io/tests... https://www.google.com/search?q=business news https://www.google.com/search?q=business news f https www.google.com /search q=business news 0 f f t t 32 80 t t t t t t t t Mozilla/5.0 (unknown-x86_64-linux-gnu) Siege/3.0.4 172.31.25.17 f f 1366 376
46e2cd90-392c-4da0-88e1-5f916d68a109 2013-10-15 21:17:07.635 30 budweiser 17 10 300 250 f 0 0 0 f f f f f f f f http://delivery.sblk.io/tests...
https://www.google.com/search?q=business news https://www.google.com/search?q=business news f https www.google.com /search q=business news 0 f f t t 32 80 t t t t t t t t Mozilla/5.0 (unknown-x86_64-linux-gnu) Siege/3.0.4 172.31.25.17 f f 1366 376
949e886d-352c-452f-a92d-762931d23f65 2013-10-15 21:17:07.649 30 budweiser 17 10 300 250 f 0 0 0 f f f f f f f f http://delivery.sb
@alexanderdean
alexanderdean / gist:6783011
Created October 1, 2013 18:38
Commenting out the Postgres VACUUM for Snowplow's StorageLoader
# status = execute_queries(target, [ "VACUUM FULL ANALYZE #{target[:table]};" ] )
# unless status == []
# raise DatabaseLoadError, "#{status[1]} error executing #{status[0]}: #{status[2]}"
# end
-- Quick workaround script because Redshift view definitions are tied to
-- table IDs, not table names. Means if a table is swapped out for a new
-- one, the views will all point to the old table.
-- To use this:
-- 1. Change your viewowner to whoever created your views
-- 2. Change your table names (if necessary)
-- 3. Run the script against your database
-- 4. Paste the output into your SQL client and execute
#!/usr/bin/env ruby
# == Simple Daemon
#
# A simple ruby daemon that you copy and change as needed.
#
# === How does it work?
#
# All this program does is fork the current process (creates a copy of
# itself) then exits, the fork (child process) then goes on to run your
# daemon code. In this example we are just running a while loop with a
@alexanderdean
alexanderdean / Specs2-Scalding problems
Created March 26, 2013 11:40
Updated console showing errors between Specs2 and Scalding
<snip>
[info] IdentityTest
[info]
[info] + The identity function should work for any pair of Strings
[info]
[info] Total for specification IdentityTest
[info] Finished in 0 ms
[info] 1 example, 100 expectations, 0 failure, 0 error
[info]
13/03/26 11:37:19 INFO flow.Flow: [com.snowplowanalytics....] starting
@alexanderdean
alexanderdean / gist:5046364
Created February 27, 2013 08:42
Specs2 parallel test issues with Scalding 0.8.3
╭─alex@nasqueron ~/Development/SnowPlow/snowplow/3-etl/hadoop-etl ‹feature/scalding-etl›
╰─$ sbt
Detected sbt version 0.12.1
Starting sbt: invoke with -help for other options
[info] Loading global plugins from /home/alex/.sbt/plugins
[info] Loading project definition from /home/alex/Development/SnowPlow/snowplow/3-etl/hadoop-etl/project
[info] Set current project to snowplow-hadoop-etl (in build file:/home/alex/Development/SnowPlow/snowplow/3-etl/hadoop-etl/)
snowplow-hadoop-etl > test-only com.snowplowanalytics.snowplow.hadoop.etl.jobs.CorruptedCfLinesTest
13/02/27 08:40:20 INFO property.AppProps: using app.id: 83265FFB1D12AEF2BE02B7B711912163
13/02/27 08:40:20 INFO util.Version: Concurrent, Inc - Cascading 2.0.7