Skip to content

Instantly share code, notes, and snippets.

View ian-whitestone's full-sized avatar
🐍
Exit code 143

Ian Whitestone ian-whitestone

🐍
Exit code 143
View GitHub Profile
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ian-whitestone
ian-whitestone / row_num.py
Created August 21, 2018 13:53
Pandas equivalent of SQL's row_number()
# Let's add an row number to indicate the first message per app & microservice
# This code is analagous to the SQL: row_number() over (partition by id, topic order by msg_ts asc)
df['row_num'] = df.sort_values(['id', 'msg_ts'], ascending=True).groupby(['id', 'topic']).cumcount() + 1
with
warehouse_periods as (
select
warehouse_name,
timestamp as valid_from,
lead(timestamp) over (partition by warehouse_name order by timestamp asc) as valid_to,
event_name = 'RESUME_WAREHOUSE' as is_active
from snowflake.account_usage.warehouse_events_history
where
-- double check these names, can't remember exact values
@ian-whitestone
ian-whitestone / simulations.ipynb
Last active December 20, 2023 03:53
Code for the choosing your randomization unit post - https://ianwhitestone.work/choosing-randomization-unit/
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ian-whitestone
ian-whitestone / zappa_package_cleaner.py
Last active June 2, 2023 06:51
Remove additional files and/or directories from Zappa deployment package via zip callback https://ianwhitestone.work/Zappa-Zip-Callbacks/
"""
Read accompanying blog post: https://ianwhitestone.work/Zappa-Zip-Callbacks
"""
import os
import re
import shutil
import tarfile
import zipfile
@ian-whitestone
ian-whitestone / example_cte_query.sql
Created March 18, 2023 12:29
Example Snowflake CTE query
with sample_data as (
select *
from snowflake_sample_data.tpch_sf1.customer
),
nation_14_customers as (
select *
from sample_data
where c_nationkey = 14
),
@ian-whitestone
ian-whitestone / notes.md
Last active March 1, 2023 01:45
Best practices for presto sql

Presto Specific

  • Don’t SELECT *, Specify explicit column names (columnar store)
  • Avoid large JOINs (filter each table first)
    • In PRESTO tables are joined in the order they are listed!!
    • Join small tables earlier in the plan and leave larger fact tables to the end
    • Avoid cross joins or 1 to many joins as these can degrade performance
  • Order by and group by take time
    • only use order by in subqueries if it is really necessary
  • When using GROUP BY, order the columns by the highest cardinality (that is, most number of unique values) to the lowest.
@ian-whitestone
ian-whitestone / query_tags.sql
Created February 23, 2023 14:11
Query tag with no node refs
{% macro set_query_tag() -%}
{# Start with any model-configured dict #}
{% set tag_dict = config.get('query_tag', default={}) %}
{# Regardless of resource type, we can always access the config via the 'model' variable #}
{%- do tag_dict.update(
dbt_snowflake_query_tags_version='1.1.3',
app='dbt',
dbt_version=dbt_version,
project_name=project_name,
@ian-whitestone
ian-whitestone / mode_confettis.js
Created January 17, 2023 16:42
Give the people confetti 🎉
<script src="https://cdn.jsdelivr.net/npm/canvas-confetti@1.4.0/dist/confetti.browser.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
<div id="bottom_of_page">Thanks for scrolling to the bottom 🙌</div>
<div>Enjoy some confetti 🎉</div>
<script>
@ian-whitestone
ian-whitestone / great_expecations_examples.py
Created January 12, 2020 22:32
Quickstart examples for getting up and running with great expectations
## Pandas
import great_expectations as ge
# Build up expectations on a sample dataset and save them
train = ge.read_csv("data/npi.csv")
train.expect_column_values_to_not_be_null("NPI")
train.save_expectation_suite("npi_csv_expectations.json")
# Load in a new dataset and test them
test = ge.read_csv("data/npi_new.csv")