rosner/dwh_modelling.md Secret

## dwh_modelling.md

      
    Raw
  

              dwh_modelling.md
            
          
    DWH modelling

For now our Redshift cluster is our main source for people to access information. In this position you might end up writing a SQL query every now and then. Please go through the following questions and provide your answer in a *.sql file somewhere in the repo.
Given the following table definition
CREATE TABLE IF NOT EXISTS dwh.user_first_install_fact (
  user_id VARCHAR(36),
  install_id VARCHAR(36),
  device_id  VARCHAR(36),
  installed_at TIMESTAMP,
  date_sk INT,
  client_sk INT,                      
  channel_sk INT,                     
  country_code VARCHAR(7),
  network_name VARCHAR(256),
  campaign_name VARCHAR(256),
  adgroup_name VARCHAR(256),
  creative_name VARCHAR(256),
  campaign_id VARCHAR(256),
  adgroup_id VARCHAR(256),
  creative_id VARCHAR(256),
  ip_address VARCHAR(39)
);

CREATE TABLE client_dim (
  client_sk   INTEGER,
  os_name     VARCHAR(256),
  app_name    VARCHAR(256),
  app_version VARCHAR(256),
  device_name VARCHAR(256),
  PRIMARY KEY (client_sk)
);

CREATE TABLE IF NOT EXISTS dwh.channel_dim (
  channel_sk INT,
  channel_name TEXT,
  channel_group TEXT,
  channel_label TEXT,
  PRIMARY KEY (channel_sk)
);

Question 1

Write a query, using Redshift ANSI SQL, to find which user_id has duplicated installs (more than one install) for yesterday.
Question 2

Write a query, using Redshift ANSI SQL, to know which are the top 5 channels with the most installs coming in from Android.

  
## stripe_webhook_consumption.md

      
    Raw
  

              stripe_webhook_consumption.md
            
          
    Background

For most of the changes happening Stripe is providing webhooks. In 8fit we need to store relevant transactional information (Charges in Stripe) to generate insights for various stakeholders:

User acquisition wants to understand how much money they recovered after spending amount X on ads.
Is a user still pro at a specific point in time?
What’s the life time value of a user?
How do renewal cohorts look like for specific plans?

Your task

Your task is to build a python 3.6.X based ETL pipeline using Lambda, StepFunctions and API-Gateway so that an incoming webhook is triggering a lambda function executor which will execute a StepFunction. The layout of the StepFunction can have different designs and is up to you. But two things should happen within the StepFunction:

The posted JSON body (looking like this) should be persisted in a Postgres DB hosted by RDS. The schema you use to do this is up to you. You might get some ideas from the use cases mentioned above.
From the posted JSON body you need to extract the source object and store this in CSV format as an object in S3

Requirements

Please provide your solution in a Github repository. Add documentation and tests as you prefer. Please use CloudFormation and the serverless framework for managing the infrastructure. Ideally we can run your code and infrastructure and connect against Stripe directly.