Scott Hoover githoov

## redshift_metadata.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              4 stars
            
          
                githoov
                / redshift_metadata.md
            
            
              Last active
              September 21, 2022 01:57
            
              
                Redshift Metadata
              
          
    Redshift Tables Used

pg_table_def, stl_query, stl_querytext, stl_tr_conflict, stl_explain, stl_alert_event_log, stl_ddltext, stl_scan, stl_save, stl_hashjoin, stl_hash, stl_plan_info, stl_return, and information_schema.table_constraints.
Queries to Extract Features


execution time

select (endtime - starttime) as execution_time_in_ms 
from stl_query 
where query = QUERY_ID;

  
## looker.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                githoov
                / looker.md
            
            
              Last active
              July 17, 2021 18:20
            
              
                A Quick Walkthrough of Looker and Accessing its APIs
              
          
    Looker Overview

As a reminder, Looker has three core components: first, it provides a physical modeling layer (LookML), which one uses to abstract SQL generation and define complex transforms; second, Looker consumes the LookML model and generates dialect-specific SQL, which it then issues to the database via JDBC; third, it provides a web-based UI which acts as both a LookML IDE as well as the primary means of exploring data.
The LookML Model

Let's consider a simple schema containing two tables, job and warehouse, which we'll use as an example. In Looker, our representation of these tables would look like this:
job definition

- view: job
  sql_table_name: public.job

  
## gondola.png
foo

## nlp.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                githoov
                / nlp.md
            
            
              Last active
              April 20, 2017 13:54
            
              
                NLP Blog
              
          
    Introduction

I recently had the opportunity to showcase Snowflake at JOIN, Looker's first user conference. I used my time to highlight a few Snowflake features that I find particularly useful, as someone who does analytics. The presentation demonstrated simultaneous workloads that share the same data as well as analytically intensive SQL patterns against large-scale, semi-structured data.
I thought I'd refactor my presentation as a series of blog entries to share some useful insights and interesting patterns with a broader audience. I'm going to step through how to incorporate sentiment analysis as well as tweet similarity into an interactive model using both Looker and Snowflake. (Note: if you haven't read our previous blog on sentiment analysis using Naïve Bayes, I highly recommend you do so.)
Part 1 - Simultaneous Workloads, Shared Data

N/A
Part 2 - Tweet Similarity

Overview


## lambda_example.py
# preliminaries
import re
import json
import logging
import boto3
import urllib
import base64

# fire up logger
logger = logging.getLogger()

## function_parse.sql
/*
  create a JavaScript UDF to find regexp matches and throw them into an array
*/
create or replace function regexp_matches(TEXT string, PATTERN string)
  returns variant
  language javascript
    as '
      var re = new RegExp(PATTERN, "g");
      res = [];
      while (m = re.exec(TEXT)) {

## tweet_simiarity.sql
/*
this is a basic first pass at tweet similarity
using a modified cosine-similarity approach. the
modification in question weights words based on their
rarity across the corpus. that means that more common
words that aren't quite stop words get a lower weight
than less common words—the method is called inverse
document frequency.
*/

## kinesis.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                githoov
                / kinesis.md
            
            
              Created
              October 15, 2016 16:01
            
          
make sure an inbound rule is set up for ssh on port 22 from anywhere
sudo yum update
sudo yum install –y aws-kinesis-agent
vi /etc/aws-kinesis/agent.json and enter the following:
{
  "cloudwatch.emitMetrics": true,
  "kinesis.endpoint": "",
  "firehose.endpoint": "https://firehose.us-east-1.amazonaws.com",


## generate.r
# preliminaires
library("ggplot2")
library("zoo")
set.seed(111)

# generate plot of survival curve
x <- sort(dexp(seq(0, 1, 0.01)), decreasing = TRUE)
ggplot(data.frame(x = c(0, 5)), aes(x)) + stat_function(fun = dexp, args = list(rate = 1)) + scale_x_continuous(labels=c(expression(t["0"], t["1"], t["2"], t["3"], t["4"], t["5"]))) + labs(x = "Time", y = expression(y = P(T > t["i"])), title = "Survival Function")

# simulate subscription data

## generate_create.rb
#!/usr/bin/env ruby

require 'json'
require 'optparse'

@file_path = ''
@separator = ''
@table_name = ''

OptionParser.new do |opts|
	# preliminaries
	import re
	import json
	import logging
	import boto3
	import urllib
	import base64

	# fire up logger
	logger = logging.getLogger()
	/*
	create a JavaScript UDF to find regexp matches and throw them into an array
	*/
	create or replace function regexp_matches(TEXT string, PATTERN string)
	returns variant
	language javascript
	as '
	var re = new RegExp(PATTERN, "g");
	res = [];
	while (m = re.exec(TEXT)) {
	/*
	this is a basic first pass at tweet similarity
	using a modified cosine-similarity approach. the
	modification in question weights words based on their
	rarity across the corpus. that means that more common
	words that aren't quite stop words get a lower weight
	than less common words—the method is called inverse
	document frequency.
	*/
	# preliminaires
	library("ggplot2")
	library("zoo")
	set.seed(111)

	# generate plot of survival curve
	x <- sort(dexp(seq(0, 1, 0.01)), decreasing = TRUE)
	ggplot(data.frame(x = c(0, 5)), aes(x)) + stat_function(fun = dexp, args = list(rate = 1)) + scale_x_continuous(labels=c(expression(t["0"], t["1"], t["2"], t["3"], t["4"], t["5"]))) + labs(x = "Time", y = expression(y = P(T > t["i"])), title = "Survival Function")

	# simulate subscription data
	#!/usr/bin/env ruby

	require 'json'
	require 'optparse'

	@file_path = ''
	@separator = ''
	@table_name = ''

	OptionParser.new do \|opts\|