Skip to content

Instantly share code, notes, and snippets.

View graham-thomson's full-sized avatar

Graham Thomson graham-thomson

  • United States
View GitHub Profile
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# coding: utf-8
# ## Plotting COVID19 Data in the US
# In[ ]:
get_ipython().run_cell_magic('bash', '', '\nCOVID_DATA_DIR=./covid-19-data/\n\nif [ ! -d ${COVID_DATA_DIR} ]; then\n git clone https://github.com/nytimes/covid-19-data.git\nelse\n cd ${COVID_DATA_DIR} && git pull\nfi')
@graham-thomson
graham-thomson / jupyter_kernel_cmds.sh
Created February 5, 2020 23:54
helpful jupyter kernel commands
#!/bin/bash
# list all jupyter kernels
jupyter kernelspec list
# remove kernel
jupyter kernelspec uninstall ${kernel_name}
# add venv to kernels
# prereqs
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.DataFrame
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.expressions.UserDefinedFunction
object FeatureVectorQuantiles {
// Simple helper to convert vector to array<double>
@graham-thomson
graham-thomson / combine_fill_null_feature_vectors.ipynb
Last active May 21, 2019 19:00
Spark DF way to fill null feature rows and combine multiple feature vectors.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import datetime as dt
import urllib.request
import json
import pandas as pd
def boston_hourly_weather(start_time, end_time):
lat, long = 42.3603, -71.0583
key = ''
temp_data = []
@graham-thomson
graham-thomson / aws_helpers.py
Last active April 8, 2019 16:42
always growing list of aws boto3 helper functions in python3
import boto3
import urllib.request
import re
from urllib.error import URLError
from subprocess import run, PIPE
def get_instance_id(timeout=5):
try:
return urllib.request.urlopen(
"http://169.254.169.254/latest/meta-data/instance-id",
import org.apache.spark.ml.linalg.{SparseVector, Vectors}
import org.apache.spark.ml.feature.StandardScaler
import org.apache.spark.sql.SparkSession
object censusAggregation {
val usage = """
Usage: censusAggregation pathToCensus outputPath
@graham-thomson
graham-thomson / get_alexa_sites.py
Created May 11, 2018 20:36
script to grab alexa top domains and stats
import argparse
import requests
import pandas as pd
import datetime as dt
from bs4 import BeautifulSoup
def get_site_divs(category):
alexa_base_url = "https://www.alexa.com/topsites/category/Top/"
if not category:
# default to global top sites
SELECT
event_time,
--user_id,
advertiser_id,
campaign_id,
ad_id,
rendering_id,
creative_version,
site_id_dcm,
placement_id,