Skip to content

Instantly share code, notes, and snippets.

Evan Burke evan-burke

Block or report user

Report or block evan-burke

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@evan-burke
evan-burke / schema.py
Last active Aug 16, 2019
Fastest way to generate a SQL schema from raw data and/or insert data to the table
View schema.py
# Use Pandas & SQLAlchemy.
# https://stackoverflow.com/questions/23103962/how-to-write-dataframe-to-postgres-table
# Note this will create a new table; see the 'faster option' at the above link for a method using 'copy' to an existing table.
# However, 'copy' cannot do an upsert; that requires inserting to a temp table, then upserting form temp table to destination table.
# This will lack PKs and FKs and indexes, of course, so if used naively you may see data duplication.
# Also the df.to_sql command can do an append (but not upsert), using the if_exists param:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
@evan-burke
evan-burke / dnspython.py
Last active Jun 27, 2019
quick dnspython usage reference
View dnspython.py
# https://github.com/rthalley/dnspython
# pip install dnspython
import dns.resolver
def lookup(domain, lookup_type):
try:
result = dns.resolver.query(domain, lookup_type)
except dns.exception.DNSException as e:
if isinstance(e, dns.resolver.NXDOMAIN):
@evan-burke
evan-burke / sqlitelib.py
Last active Jan 16, 2019
simple python3 sqlite3 wrapper
View sqlitelib.py
""" This is pretty basic but makes it a little eaiser to use sqlite3 for common tasks.
Example usage:
db = sqlitelib(dbfile)
result = db.execute("select * from mytable")
or:
with sqlitelib(dbfile) as db:
result = db.execute("select * from mytable")
@evan-burke
evan-burke / retry.php
Last active Sep 14, 2018 — forked from orottier/ retry.php
Retry function for PHP with linear backoff
View retry.php
<?php
/*
* Retry function for e.g. external API calls
*
* Will try the risky API call, and retries with an ever increasing delay if it fails
* Throws the latest error if $maxRetries is reached,
* otherwise it will return the value returned from the closure.
*
*/
@evan-burke
evan-burke / postgres bulk percentiles calculation
Last active Aug 13, 2018
Postgres bulk percentile calculation with generate_series() and percentile_cont()
View postgres bulk percentiles calculation
This will return the percentiles in order.
-- #1
select unnest(
percentile_cont(
(select array_agg(s) from generate_series(0, 1, 0.2) as s)
) WITHIN GROUP (ORDER BY SIZE))
from mytable
@evan-burke
evan-burke / bfg
Last active Apr 7, 2018
cleaning sensitive data from a repo
View bfg
- Start somewhere with Java runtime and a Git command line. Git should be configured with a valid SSH key for the repo.
- Create and push a new commit reverting any changes you want to wipe. This is done to ensure the commit is not 'protected'.
- Create & cd into temporary dir if you want.
- Download .jar file from here - search for 'downloadable jar'.
https://rtyley.github.io/bfg-repo-cleaner/
- Clone repo (with github, use ssh URL, not https) with --mirror:
@evan-burke
evan-burke / dremio-ubuntu
Last active Oct 16, 2019 — forked from jcaristy/info.sh
[DREMIO: Install dremio on Ubuntu] #dremio
View dremio-ubuntu
Installing Dremio 1.4 on Ubuntu 16
##Install links / references
https://www.dremio.com/tutorials/recommender-scikit-learn-dremio-postgres-mongodb/
https://www.dremio.com/tutorials/dremio-oracle-aws/
https://docs.dremio.com/deployment/standalone-tarball.html
## First, install Java 1.8+, 64 bit.
sudo apt-get update
@evan-burke
evan-burke / psycopg2_update_execute_values.py
Last active Dec 3, 2017
update query using psycopg2 execute_values
View psycopg2_update_execute_values.py
# Updates are a little tricky using psycopg2.extras.execute_values(), and documentation is a little sparse.
# http://initd.org/psycopg/docs/extras.html
db_connection_string = "dbname=EDITME host=EDITME"
# uuids, as strings
data = ['6ef0f42a-63da-4edb-9a11-5e146cb337ac','e7b1e961-0a68-4c4f-a716-e0959593f27d','1f82c9a5-00c3-4bd8-8c50-0ede441b4e91']
query = """update my_table t
set bool_field = true
@evan-burke
evan-burke / insert_paginator.py
Last active Jun 19, 2018
psycopg2 execute_values wrapper for accurate row counts
View insert_paginator.py
# One of the fastest ways to insert bulk data into Postgres (at least, aside from COPY) is using the psycopg2 extras function execute_values.
# However, this doesn't return an accurate row count value - instead, it just returns the row count for the last page inserted.
# This wraps the execute_values function with its own pagination to return an accurate count of rows inserted.
# Performance is approximately equivalent to underlying execute_values function - within 5-10% or so in my brief tests.
import psycopg2
import psycopg2.extras
import math
db_connection_string = "dbname=EDITME host=EDITME"
You can’t perform that action at this time.