- Don’t
SELECT *
, Specify explicit column names (columnar store) - Avoid large JOINs (filter each table first)
- In PRESTO tables are joined in the order they are listed!!
- Join small tables earlier in the plan and leave larger fact tables to the end
- Avoid cross joins or 1 to many joins as these can degrade performance
- Order by and group by take time
- only use order by in subqueries if it is really necessary
- When using GROUP BY, order the columns by the highest cardinality (that is, most number of unique values) to the lowest.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Let's add an row number to indicate the first message per app & microservice | |
# This code is analagous to the SQL: row_number() over (partition by id, topic order by msg_ts asc) | |
df['row_num'] = df.sort_values(['id', 'msg_ts'], ascending=True).groupby(['id', 'topic']).cumcount() + 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
with | |
warehouse_periods as ( | |
select | |
warehouse_name, | |
timestamp as valid_from, | |
lead(timestamp) over (partition by warehouse_name order by timestamp asc) as valid_to, | |
event_name = 'RESUME_WAREHOUSE' as is_active | |
from snowflake.account_usage.warehouse_events_history | |
where | |
-- double check these names, can't remember exact values |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Read accompanying blog post: https://ianwhitestone.work/Zappa-Zip-Callbacks | |
""" | |
import os | |
import re | |
import shutil | |
import tarfile | |
import zipfile | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
with sample_data as ( | |
select * | |
from snowflake_sample_data.tpch_sf1.customer | |
), | |
nation_14_customers as ( | |
select * | |
from sample_data | |
where c_nationkey = 14 | |
), |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<script src="https://cdn.jsdelivr.net/npm/canvas-confetti@1.4.0/dist/confetti.browser.min.js"></script> | |
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script> | |
<div id="bottom_of_page">Thanks for scrolling to the bottom 🙌</div> | |
<div>Enjoy some confetti 🎉</div> | |
<script> | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Pandas | |
import great_expectations as ge | |
# Build up expectations on a sample dataset and save them | |
train = ge.read_csv("data/npi.csv") | |
train.expect_column_values_to_not_be_null("NPI") | |
train.save_expectation_suite("npi_csv_expectations.json") | |
# Load in a new dataset and test them | |
test = ge.read_csv("data/npi_new.csv") |
NewerOlder