Skip to content

Instantly share code, notes, and snippets.

View alecbw's full-sized avatar

Alec Barrett-Wilsdon alecbw

View GitHub Profile
@alecbw
alecbw / [Oneliner] Print CSV Info
Created September 12, 2020 06:50
In one command, open a Python shell, get every CSV and XLSX in the local directory, and print its name, row/col count, and column names
python3
import pandas as pd; import os; files = [f for f in os.listdir('.') if (os.path.isfile(f) and os.path.getsize(f) != 0 and any(x for x in [".csv", ".xlsx"] if x in f))]; print(files); df_tuples = [(f, pd.read_csv(f)) for f in files]; [print(df_tup[0], df_tup[1].shape, df_tup[1].columns, "\n") for df_tup in df_tuples]
@alecbw
alecbw / break-sls-deploy-if-env-vars-missing.yml
Created September 18, 2020 01:21
A script snippet that you can throw at the bottom of your serverless.yml that evaluates env vars at runtime and will break sls deploys if they are missing
custom:
scripts:
commands:
hello: This breaks the deploy if env vars aren't set. ${env:FOOBAR}
@alecbw
alecbw / Track NPS Survey
Last active September 24, 2020 06:37
A JS script that listens for a series of buttons and makes an API call to increment a counter
feedbackButtons = document.getElementsByClassName("feedbackButton")
for (i = 0; i < feedbackButtons.length; i++) {
feedbackButtons[i].addEventListener("click", trackClick);
}
function trackClick(event) {
var page = window.location.pathname
var button = event.target.id
fetch('https://foobar.execute-api.us-west-1.amazonaws.com/prod/feedback?page=' + page + '&button=' + button)
@alecbw
alecbw / Garbage_Collect_DataFrames.py
Created September 24, 2020 21:46
deletes any Pandas DataFrames in memory
import gc
to_delete = []
for name, value in vars().items():
if isinstance(value, pd.DataFrame):
to_delete.append(name)
for item in to_delete:
print(item)
del item
@alecbw
alecbw / Facebook Ads Lookup API call.py
Last active September 29, 2020 19:15
Returns the ads that were in paused campaigns between time_start and time_end
import requests
time_start = '2020-09-01'
time_end = '2020-09-01'
fields = ['website_ctr','reach','adset_name','frequency','action_values','campaign_name','unique_actions','unique_clicks','video_avg_percent_watched_actions','video_p75_watched_actions','spend','cpc','video_p25_watched_actions','canvas_avg_view_time','canvas_avg_view_percent','campaign_id','video_p50_watched_actions','ctr','cpm','cpp','unique_ctr','video_avg_time_watched_actions','ad_name','impressions','labels','video_p95_watched_actions','cost_per_10_sec_video_view','ad_id','adset_id','clicks','website_purchase_roas','location','actions','cost_per_unique_click']
fields = "['" + "','".join(fields) + "']"
api_url = 'https://graph.facebook.com/v8.0/' + os.environ['FB_ACCOUNT_ID'] + "/insights?"
api_url += "level=ad"
@alecbw
alecbw / Get Row Count of CSV in S3
Last active December 6, 2020 04:12
Uses S3 Select. Up to 15x faster locally
import boto3
def get_row_count_of_s3_csv(bucket_name, path):
sql_stmt = """SELECT count(*) FROM s3object """
req = boto3.client('s3').select_object_content(
Bucket=bucket_name,
Key=path,
ExpressionType="SQL",
Expression=sql_stmt,
InputSerialization = {"CSV": {"FileHeaderInfo": "Use", "AllowQuotedRecordDelimiter": True}},
@alecbw
alecbw / mailchimp_modify_user_tag.py
Last active December 16, 2020 01:07
MailChimp Modify User's Tag (add, update, deactivate) - The Python implementation that doesn't use their SDK
import hashlib
import requests
import os
import json
"""
Docs: https://mailchimp.com/developer/guides/organize-contacts-with-tags/#label-a-contact-with-a-tag
Emails must be MD5 hashed before sending the call (such is done so below)
The API returns 204 No Content no matter if the input is valid or invalid
@alecbw
alecbw / IAM Role AttachedPolicy for awswrangler writes to Athena-table-linked S3 Data Lakes.json
Created December 29, 2020 20:16
A policy you can attach to a role to enable the caller to write to S3 buckets (and corresponding Athena tables). NOTE: overwrite and overwrite-partitions require additional permissions
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::bucket-your-data-is-in",
@alecbw
alecbw / IAM Role AttachedPolicy for reading from Athena Tables.json
Created December 29, 2020 20:22
A policy you can attach to a role to enable the caller to read from Athena tables. s3:PutObject is required to put the query results CSV in the query result bucket.
{
"Statement": [
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Effect": "Allow",
"Resource": [
@alecbw
alecbw / awswrangler Athena+Glue+Redshift Example Functions.py
Created December 29, 2020 20:26
Useful snippets from the docs with some commentary in the comments
import awswrangler as wr
import pandas as pd
get current IAM role/user
name = wr.sts.get_current_identity_name()
arn = wr.sts.get_current_identity_arn()
# Reading files
df = wr.s3.read_csv(f"s3://sample-bucket/sample.csv") #you can optionally select a subset of columns with names=['col_name1'] and parse date cols with: parse_dates=["col_name2"]
df = wr.s3.read_json(f"s3://sample-bucket/sample.json")