Skip to content

Instantly share code, notes, and snippets.

@WillianFuks
WillianFuks / example_dataproc_utils.sh
Created November 2, 2017 00:36
Utils function for launch_jupyter.sh script
#!/usr/bin/env bash
function_exists () {
declare -f -F $1 > /dev/null
return $?
}
throw () {
echo "$*" >&2
echo
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@WillianFuks
WillianFuks / example_dataproc_twitter_naive.py
Last active December 8, 2017 02:17
Naive Cosines in Map-Reduce
import operator
import math
import time
from base import JobsBase
from pyspark.sql import SparkSession
from pyspark.sql import types as stypes
class NaiveJob(JobsBase):
def run(self, sc, args):
@WillianFuks
WillianFuks / example_dataproc_twitter_worker_bq.py
Created December 8, 2017 19:38
Worker Sample for BigQuery Export
import utils
from flask import Flask, request
from config import config
from connector.gcp import GCPService
from scheduler import SchedulerJob
app = Flask(__name__)
gcp_service = GCPService()
scheduler = SchedulerJob()
@WillianFuks
WillianFuks / example_dataproc_twitter_main_handler.py
Created December 8, 2017 23:31
Main Route for Cron Requests
import utils
import base_utils
from config import config
from flask import Flask, request, jsonify
from factory import JobsFactory
from google.appengine.ext import ndb
import time
app = Flask(__name__)
import utils
from flask import Flask, request
from config import config
from connector.gcp import GCPService
from scheduler import SchedulerJob
app = Flask(__name__)
gcp_service = GCPService()
scheduler = SchedulerJob()
service: dataproc-twitter
runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /.*/
script: main.app
login: admin
@WillianFuks
WillianFuks / example_dataproc_twitter_worker.yaml
Created December 8, 2017 23:44
yaml definition for workers
runtime: python27
api_version: 1
threadsafe: true
service: worker
handlers:
- url: /.*
script: worker.app
login: admin
@WillianFuks
WillianFuks / queue.yaml
Created December 8, 2017 23:52
Queue yaml
queue:
- name: default
rate: 1/m
cron:
- description: daily export job from BigQuery to Cloud Storage
url: /run_job/export_customers_from_bq/?url=/export_customers&target=worker
target: dataproc-twitter
schedule: every day 01:05
- description: Runs DIMSUM job in Dataproc
url: /run_job/run_dimsum/?url=/dataproc_dimsum&target=worker&extended_args=--days_init=30,--days_end=1,--threshold=0.1&force=no
target: dataproc-twitter
schedule: every day 01:15