Skip to content

Instantly share code, notes, and snippets.

Stefan Thoss stefanthoss

Block or report user

Report or block stefanthoss

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
stefanthoss / pbzip2.Dockerfile
Last active Jul 11, 2019
Install pbzip2 in an Alpine Linux Docker image
View pbzip2.Dockerfile
FROM alpine:3.10
RUN apk add --no-cache \
bzip2-dev \
g++ \
RUN cd /tmp/ && \
wget -q && \
tar -xzf pbzip2-1.1.13.tar.gz && \
stefanthoss /
Created Jun 19, 2019
Export/import a PySpark schema to/from a JSON file
import json
from pyspark.sql.types import *
# Define the schema
schema = StructType(
[StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
# Write the schema
with open("schema.json", "w") as f:
stefanthoss /
Created Apr 29, 2016
Linux command to export a PostgreSQL table from a remote server to a local CSV file.
psql -h hostname -U username -W -d database -t -A -F "," -c "SELECT * FROM table" > file.csv
# Explanation of the used options:
# -h Specifies the host name of the machine on which the server is running.
# -U Connect to the database as a specific user.
# -W Force psql to prompt for a password before connecting to a database.
# -d Specifies the name of the database to connect to.
# -t Turn off printing of column names and result row count footers, etc.
# -A Switches to unaligned output mode.
# -F Use separator as the field separator for unaligned output.
stefanthoss /
Last active Apr 9, 2016
Import data from a local CSV file to a PostgreSQL database table using pandas and psycopg2. 'null' values in the CSV file get replaced by real NULL values.
import pandas as pd
import numpy as np
import psycopg2
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://<user>:<password>@<host>[:<port>]/<dbname>')
df = pd.read_csv('local-file.csv', sep=',').replace(to_replace='null', value=np.NaN)
df.to_sql('dbtable', engine, schema='dbschema', if_exists='replace')
stefanthoss /
Created Mar 21, 2016
Shell script to automatically commit all new/modified/deleted files in a Git repository. Can be used as a backup tool with cron.
cd /path/to/git/repo/
git add -A
git commit -m "Backup on `date`"
git push origin
stefanthoss /
Last active Aug 8, 2019
Import data from a MySQL database table into a Pandas DataFrame using the pymysql package.
import pandas as pd
import pymysql
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://<user>:<password>@<host>[:<port>]/<dbname>')
df = pd.read_sql_query('SELECT * FROM table', engine)
stefanthoss /
Last active Jul 24, 2019 — forked from dougvk/
(stock ticker -> CIK) dictionary using SEC EDGAR using stdout
import re
import requests
URL = "{}&Find=Search&owner=exclude&action=getcompany"
CIK_RE = re.compile(r".*CIK=(\d{10}).*")
cik_dict = {}
for ticker in DEFAULT_TICKERS:
results = CIK_RE.findall(requests.get(URL.format(ticker)).content.decode("ascii"))
You can’t perform that action at this time.