Skip to content

Instantly share code, notes, and snippets.

@elliottcordo
elliottcordo / package_tldextract_athena.sh
Created February 21, 2024 17:38
Example of how to package a Python package for Athena Spark
mkdir tldextract
cd tldextract
mkdir unpacked
pip install -t $PWD/unpacked tldextract cd unpacked
zip -r9 ../tldextract *
@elliottcordo
elliottcordo / athena_runner.py
Created February 21, 2024 17:02
Simple Athena Spark Runner
from time import sleep
import boto3
client = boto3.client('athena')
calculation_response = client.start_session(
Description='job_session',
WorkGroup='aws-meetup',
EngineConfiguration={
'CoordinatorDpuSize': 1,
'DefaultExecutorDpuSize': 1,
import networkx as nx
Graphtype = nx.Graph()
with open('result.csv', "r") as data:
# skip the header
next(data, None)
# parse into the graph
G = nx.parse_edgelist(data, delimiter=',', create_using=Graphtype)
use database "CRM_POC_DB";
use schema "CUST_CLEANUP";
drop table match_results;
create temporary table match_results as
with cust as (
select
ACCT_CP,
COMPANY_CUST,
COMPANY_CODE,
@elliottcordo
elliottcordo / Translate-a-tron.py
Last active January 27, 2020 20:01
Translate-a-tron.py
import boto3
import os
import slack
from pprint import pprint
SLACK_API_TOKEN = os.environ["SLACK_API_TOKEN"]
BOT_ID = '<@UT4N8STGX>'
FLAG_MAP = {
':flag-mx:': 'es',
':flag-cn:': 'zh',
@elliottcordo
elliottcordo / emr_spark_thrift_on_yarn
Created December 15, 2014 22:21
EMR spark thrift server
#on cluster
thrift /spark/sbin/start-thriftserver.sh --master yarn-client
#ssh tunnel, direct 10000 to unused 8157
ssh -i ~/caserta-1.pem -N -L 8157:ec2-54-221-27-21.compute-1.amazonaws.com:10000 hadoop@ec2-54-221-27-21.compute-1.amazonaws.com
#see this for JDBC config on client http://blogs.aws.amazon.com/bigdata/post/TxT7CJ0E7CRX88/Using-Amazon-EMR-with-SQL-Workbench-and-other-BI-Tools
@elliottcordo
elliottcordo / redshift_connection_killer.py
Last active June 23, 2016 18:59
Redshift Connection Killer
#!/usr/bin/env python
import sys, os
import json
import argparse
from classes.Config import Config
from classes.pg_tools import PGInteraction
from classes.Logger import Logger as l
from pprint import pprint
from datetime import datetime, timedelta
@elliottcordo
elliottcordo / crontab.md
Last active May 18, 2016 11:51
crontab for @reboot actions

Make sure you setup your crontab as su.. crontabs are user specific and you don't want the same jobs scheduled twice!

sudo -s
export VISUAL=nano; crontab -e

Make entries and when done control-x and enter yes for saving changes

0 23   *   *   *    /sbin/shutdown -r +5
@reboot python /usr/local/robopager/robopager/robopager.py
@elliottcordo
elliottcordo / gist:59d3c90b158331fe6ed7
Created August 13, 2014 20:21
python-redshift-pandas-statistics
import sys
import logging
import psycopg2
import pandas as pd
import pandas.io.sql as sqlio
import ConfigParser
import argparse
import statistics
from pandas import pivot_table, crosstab
from datetime import datetime
[general]
recording_length=10
wait_time=120
archive_path=/Users/elliottcordo/projects/pinary/archive
[email]
from_email=xxx
to_email=xxx
password=xxx