Skip to content

Instantly share code, notes, and snippets.

View evan-burke's full-sized avatar

Evan Burke evan-burke

  • Oakland, CA
View GitHub Profile
@evan-burke
evan-burke / insert_paginator.py
Last active April 15, 2020 22:02
psycopg2 execute_values wrapper for accurate row counts
# One of the fastest ways to insert bulk data into Postgres (at least, aside from COPY) is using the psycopg2 extras function execute_values.
# However, this doesn't return an accurate row count value - instead, it just returns the row count for the last page inserted.
# This wraps the execute_values function with its own pagination to return an accurate count of rows inserted.
# Performance is approximately equivalent to underlying execute_values function - within 5-10% or so in my brief tests.
import psycopg2
import psycopg2.extras
import math
db_connection_string = "dbname=EDITME host=EDITME"
@evan-burke
evan-burke / psycopg2_update_execute_values.py
Last active June 3, 2020 23:45
update query using psycopg2 execute_values
# Updates are a little tricky using psycopg2.extras.execute_values(), and documentation is a little sparse.
# http://initd.org/psycopg/docs/extras.html
db_connection_string = "dbname=EDITME host=EDITME"
# uuids, as strings
my_data = ['6ef0f42a-63da-4edb-9a11-5e146cb337ac','e7b1e961-0a68-4c4f-a716-e0959593f27d','1f82c9a5-00c3-4bd8-8c50-0ede441b4e91']
query = """update my_table t
set bool_field = true
@evan-burke
evan-burke / dremio-ubuntu
Last active July 10, 2021 01:01 — forked from jcaristy/info.sh
[DREMIO: Install dremio on Ubuntu] #dremio
Installing Dremio 1.4 on Ubuntu 16
### NOTE: this is significantly out of date since I last edited it in Jan 2018.
# See the comments on the gist for suggested changes for more recent versions.
##Install links / references
https://www.dremio.com/tutorials/recommender-scikit-learn-dremio-postgres-mongodb/
https://www.dremio.com/tutorials/dremio-oracle-aws/
https://docs.dremio.com/deployment/standalone-tarball.html
@evan-burke
evan-burke / bfg
Last active April 7, 2018 00:19
cleaning sensitive data from a repo
- Start somewhere with Java runtime and a Git command line. Git should be configured with a valid SSH key for the repo.
- Create and push a new commit reverting any changes you want to wipe. This is done to ensure the commit is not 'protected'.
- Create & cd into temporary dir if you want.
- Download .jar file from here - search for 'downloadable jar'.
https://rtyley.github.io/bfg-repo-cleaner/
- Clone repo (with github, use ssh URL, not https) with --mirror:
@evan-burke
evan-burke / postgres bulk percentiles calculation
Last active March 30, 2020 13:16
Postgres bulk percentile calculation with generate_series() and percentile_cont()
This will return the percentiles in order.
-- #1
select unnest(
percentile_cont(
(select array_agg(s) from generate_series(0, 1, 0.2) as s)
) WITHIN GROUP (ORDER BY SIZE))
from mytable
@evan-burke
evan-burke / retry.php
Last active September 14, 2018 04:51 — forked from orottier/RetryTest.php
Retry function for PHP with linear backoff
<?php
/*
* Retry function for e.g. external API calls
*
* Will try the risky API call, and retries with an ever increasing delay if it fails
* Throws the latest error if $maxRetries is reached,
* otherwise it will return the value returned from the closure.
*
*/
@evan-burke
evan-burke / sqlitelib.py
Last active December 23, 2021 16:39
simple python3 sqlite3 wrapper
""" This is pretty basic but makes it a little eaiser to use sqlite3 for common tasks.
Example usage:
db = sqlitelib(dbfile)
result = db.execute("select * from mytable")
or:
with sqlitelib(dbfile) as db:
result = db.execute("select * from mytable")
@evan-burke
evan-burke / dnspython.py
Last active November 29, 2021 13:00
quick dnspython usage reference
# https://github.com/rthalley/dnspython
# pip install dnspython
# NOTE reverse DNS lookups on Docker seem to be failing with Docker versions between ~18.6 and ~19.03.8, as of 2020-05-13
# This generates errors like:
# `The DNS response does not contain an answer to the question: 9.8.7.6.in-addr.arpa. IN PTR`
# Workaround: start containers with external DNS where possible using the --dns flag in 'docker run'
import dns.resolver
import sys
@evan-burke
evan-burke / schema.py
Last active July 5, 2022 16:08
Fastest way to generate a SQL schema from raw data and/or insert data to the table
# Use Pandas & SQLAlchemy.
# https://stackoverflow.com/questions/23103962/how-to-write-dataframe-to-postgres-table
# Note this will create a new table; see the 'faster option' at the above link for a method using 'copy' to an existing table.
# However, 'copy' cannot do an upsert; that requires inserting to a temp table, then upserting form temp table to destination table.
# This will lack PKs and FKs and indexes, of course, so if used naively you may see data duplication.
# Also the df.to_sql command can do an append (but not upsert), using the if_exists param:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
@evan-burke
evan-burke / fabric_ssh.py
Created January 9, 2020 23:29
ssh with Fabric using an SSH key
# Fabric can be used to run commands on a remote system over SSH.
# Sadly, its docuentation is a bit short for connecting using a private key file.
# Other options exist too, like this one using an SSH config file - https://gist.github.com/aubricus/5157931
import fabric
# Openssh formatted private key:
keyfile = "/path/to/your/privkey"
host = "fqdn"