Skip to content

Instantly share code, notes, and snippets.

@WillianFuks
WillianFuks / example_dataproc_customers_interaction.sql
Last active October 31, 2017 14:49
Query used in example dataproc post
#standardSQL
CREATE TEMP FUNCTION process_sku(sku STRING) AS (
CASE WHEN (CHAR_LENGTH(sku) - CHAR_LENGTH(REGEXP_REPLACE(sku, r'-', '')) = 3) OR (CHAR_LENGTH(sku) - CHAR_LENGTH(REGEXP_REPLACE(sku, r'-', '')) = 1) THEN REGEXP_EXTRACT(sku, r'(.*)-[0-9A-Z]+')
ELSE sku END
);
SELECT
user,
productSku,
@WillianFuks
WillianFuks / example_dataproc_simulated_query.sql
Last active October 31, 2017 15:20
Simulated data used in unit testing
(
SELECT '1' AS fullvisitorid, ARRAY< STRUCT<ecommerceAction STRUCT<action_type STRING>, product ARRAY<STRUCT<productSKU STRING> > > > [STRUCT(STRUCT("2" AS action_type) AS ecommerceAction, [STRUCT("sku0" AS productSKU), STRUCT("sku1" AS productSKU)] AS product), STRUCT(STRUCT("1" AS action_type) AS ecommerceAction, [STRUCT("sku0" AS productSKU), STRUCT("sku1" AS productSKU)] AS product), STRUCT(STRUCT("3" AS action_type) AS ecommerceAction, [STRUCT("sku0" AS productSKU)] AS product), STRUCT(STRUCT("6" AS action_type) AS ecommerceAction, [STRUCT("sku0" AS productSKU)] AS product)] AS hits UNION ALL
SELECT '1' AS fullvisitorid, ARRAY< STRUCT<ecommerceAction STRUCT<action_type STRING>, product ARRAY<STRUCT<productSKU STRING> > > > [STRUCT(STRUCT("2" AS action_type) AS ecommerceAction, [STRUCT("sku3" AS productSKU)] AS product)] AS hits UNION ALL
SELECT '2' AS fullvisitorid, ARRAY< STRUCT<ecommerceAction STRUCT<action_type STRING>, product ARRAY<STRUCT<productSKU STRING> > > > [STRUCT(STRUCT("
@WillianFuks
WillianFuks / example_dataproc_test_query.py
Created October 31, 2017 15:23
Testing Script to run system test against BigQuery
import re
import unittest
from google.cloud.bigquery import Client
bc = Client()
class TestQueriesResults(unittest.TestCase):
def test_queries(self):
@WillianFuks
WillianFuks / example_dataproc_main.py
Created October 31, 2017 19:25
Main application to receive CRON requests
import datetime
from flask import Flask, request
from google.appengine.api import taskqueue
from utils import process_url_date
app = Flask(__name__)
@app.route("/export_customers")
def export_customers():
@WillianFuks
WillianFuks / nox_system_testing.py
Created October 31, 2017 19:53
Example Nox file for system testing
import os
def session_system_gae(session):
"""Runs integration tests. As this runs a real query against BigQuery,
the environemnt must have ``GOOGLE_APPLICATION_CREDENTIALS`` set pointing
to ``/key.json`` where the secrets service json must be located.
"""
session.interpreter = 'python2.7'
session.virtualenv_dirname = 'system-gae'
@WillianFuks
WillianFuks / example_dataproc_worker.py
Last active October 31, 2017 21:07
Worker file to handle queued tasks
import utils as utils
from flask import Flask, request
from config import config
from connector.gcp import GCPService
app = Flask(__name__)
bq_service = GCPService('bigquery')
@app.route("/queue_export", methods=['POST'])
@WillianFuks
WillianFuks / example_dataproc_connector.py
Created October 31, 2017 21:16
Connector used to interact with GCP
import time
from google.auth import app_engine
import googleapiclient.discovery as disco
from google.oauth2 import service_account
class GCPService(object):
def __init__(self, name, credentials=None):
"""Builds a connector to interact with Google Cloud tools.
@WillianFuks
WillianFuks / app.yaml
Created October 31, 2017 21:32
app yaml file
service: exporter
runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /.*
script: main.app
login: admin
@WillianFuks
WillianFuks / example_dataproc_create_cluster.sh
Created November 2, 2017 00:14
Creating Clusters in Dataproc
#!/usr/bin/env bash
set -e
function usage {
echo "Creates a Dataproc cluster with a Jupyter interface."
echo "usage $0: [-h] [-n=name] [-b=bucket]"
echo " -h display help"
echo " -n=name name of cluster to create"
echo " -b=bucket name of bucket in GCS for persistence"
exit 1
@WillianFuks
WillianFuks / example_dataproc_launch_jupyter.sh
Created November 2, 2017 00:35
Launching a Jupyter Notebook through SSH connection to our Dataproc Cluster
#!/usr/bin/env bash
set -e
DIR="${BASH_SOURCE%/*}"
[[ ! -d "$DIR" ]] && DIR="$PWD"
source "utils.sh"
function usage {
echo "Creates an SSH tunnel and socks proxy and launches Chrome, using the environment "