Skip to content

Instantly share code, notes, and snippets.

Scott Hoover githoov

Block or report user

Report or block githoov

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View function_parse.sql
/*
create a JavaScript UDF to find regexp matches and throw them into an array
*/
create or replace function regexp_matches(TEXT string, PATTERN string)
returns variant
language javascript
as '
var re = new RegExp(PATTERN, "g");
res = [];
while (m = re.exec(TEXT)) {
@githoov
githoov / looker.md
Last active Nov 13, 2016
A Quick Walkthrough of Looker and Accessing its APIs
View looker.md

Looker Overview

As a reminder, Looker has three core components: first, it provides a physical modeling layer (LookML), which one uses to abstract SQL generation and define complex transforms; second, Looker consumes the LookML model and generates dialect-specific SQL, which it then issues to the database via JDBC; third, it provides a web-based UI which acts as both a LookML IDE as well as the primary means of exploring data.

The LookML Model

Let's consider a simple schema containing two tables, job and warehouse, which we'll use as an example. In Looker, our representation of these tables would look like this:

job definition

- view: job
  sql_table_name: public.job
@githoov
githoov / nlp.md
Last active Apr 20, 2017
NLP Blog
View nlp.md

Introduction

I recently had the opportunity to showcase Snowflake at JOIN, Looker's first user conference. I used my time to highlight a few Snowflake features that I find particularly useful, as someone who does analytics. The presentation demonstrated simultaneous workloads that share the same data as well as analytically intensive SQL patterns against large-scale, semi-structured data.

I thought I'd refactor my presentation as a series of blog entries to share some useful insights and interesting patterns with a broader audience. I'm going to step through how to incorporate sentiment analysis as well as tweet similarity into an interactive model using both Looker and Snowflake. (Note: if you haven't read our previous blog on sentiment analysis using Naïve Bayes, I highly recommend you do so.)

Part 1 - Simultaneous Workloads, Shared Data

N/A

Part 2 - Tweet Similarity

Overview

@githoov
githoov / tweet_simiarity.sql
Last active Oct 17, 2016
Tweet Similarity
View tweet_simiarity.sql
/*
this is a basic first pass at tweet similarity
using a modified cosine-similarity approach. the
modification in question weights words based on their
rarity across the corpus. that means that more common
words that aren't quite stop words get a lower weight
than less common words—the method is called inverse
document frequency.
*/
View kinesis.md
  1. make sure an inbound rule is set up for ssh on port 22 from anywhere
  2. sudo yum update
  3. sudo yum install –y aws-kinesis-agent
  4. vi /etc/aws-kinesis/agent.json and enter the following:
    {
      "cloudwatch.emitMetrics": true,
      "kinesis.endpoint": "",
      "firehose.endpoint": "https://firehose.us-east-1.amazonaws.com",
    
@githoov
githoov / redshift_metadata.md
Last active Sep 14, 2018
Redshift Metadata
View redshift_metadata.md

Redshift Tables Used

pg_table_def, stl_query, stl_querytext, stl_tr_conflict, stl_explain, stl_alert_event_log, stl_ddltext, stl_scan, stl_save, stl_hashjoin, stl_hash, stl_plan_info, stl_return, and information_schema.table_constraints.

Queries to Extract Features

  • execution time
select (endtime - starttime) as execution_time_in_ms 
from stl_query 
where query = QUERY_ID;
View generate_create.rb
#!/usr/bin/env ruby
require 'json'
require 'optparse'
@file_path = ''
@separator = ''
@table_name = ''
OptionParser.new do |opts|
@githoov
githoov / scalikejdbc_adhoc.scala
Last active Jun 2, 2016
Ad Hoc Querying with ScalikeJDBC
View scalikejdbc_adhoc.scala
/*
in app/models/Query.scala
*/
package models
import scalikejdbc._
import play.api.libs.json._
import play.api.libs.json.Json
import play.api.libs.json.util._
View chat_sentiment_analysis.py
# preliminaries
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.stem.porter import PorterStemmer
from nltk import tokenize, pos_tag
import csv
import re
# read in data
comments = csv.reader(open("/Users/scott/Downloads/issue_comments.csv", "rb"))
@githoov
githoov / benchmarks.sql
Last active May 21, 2016
Redshift Benchmark Queries
View benchmarks.sql
/*
${TABLE} is a placeholder for the table variants
you will be benchmarking with this script. Leave
${TABLE} in, as it will be used to dynamical substitute
your table names at the command line.
*/
\timing on
\o /dev/null
You can’t perform that action at this time.