Skip to content

Instantly share code, notes, and snippets.

Stefan Thoss stefanthoss

Block or report user

Report or block stefanthoss

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@stefanthoss
stefanthoss / pbzip2.Dockerfile
Last active Jul 11, 2019
Install pbzip2 in an Alpine Linux Docker image
View pbzip2.Dockerfile
FROM alpine:3.10
RUN apk add --no-cache \
bzip2-dev \
g++ \
make
RUN cd /tmp/ && \
wget -q https://launchpad.net/pbzip2/1.1/1.1.13/+download/pbzip2-1.1.13.tar.gz && \
tar -xzf pbzip2-1.1.13.tar.gz && \
@stefanthoss
stefanthoss / export-pyspark-schema-to-json.py
Created Jun 19, 2019
Export/import a PySpark schema to/from a JSON file
View export-pyspark-schema-to-json.py
import json
from pyspark.sql.types import *
# Define the schema
schema = StructType(
[StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
)
# Write the schema
with open("schema.json", "w") as f:
@stefanthoss
stefanthoss / export-postgresql-table.sh
Created Apr 29, 2016
Linux command to export a PostgreSQL table from a remote server to a local CSV file.
View export-postgresql-table.sh
psql -h hostname -U username -W -d database -t -A -F "," -c "SELECT * FROM table" > file.csv
# Explanation of the used options:
# -h Specifies the host name of the machine on which the server is running.
# -U Connect to the database as a specific user.
# -W Force psql to prompt for a password before connecting to a database.
# -d Specifies the name of the database to connect to.
# -t Turn off printing of column names and result row count footers, etc.
# -A Switches to unaligned output mode.
# -F Use separator as the field separator for unaligned output.
@stefanthoss
stefanthoss / postgres-csv-import.py
Last active Apr 9, 2016
Import data from a local CSV file to a PostgreSQL database table using pandas and psycopg2. 'null' values in the CSV file get replaced by real NULL values.
View postgres-csv-import.py
import pandas as pd
import numpy as np
import psycopg2
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://<user>:<password>@<host>[:<port>]/<dbname>')
df = pd.read_csv('local-file.csv', sep=',').replace(to_replace='null', value=np.NaN)
df.to_sql('dbtable', engine, schema='dbschema', if_exists='replace')
@stefanthoss
stefanthoss / git_backup_script.sh
Created Mar 21, 2016
Shell script to automatically commit all new/modified/deleted files in a Git repository. Can be used as a backup tool with cron.
View git_backup_script.sh
#!/bin/bash
cd /path/to/git/repo/
git add -A
git commit -m "Backup on `date`"
git push origin
@stefanthoss
stefanthoss / mysql-pandas-import.py
Last active Aug 8, 2019
Import data from a MySQL database table into a Pandas DataFrame using the pymysql package.
View mysql-pandas-import.py
import pandas as pd
import pymysql
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://<user>:<password>@<host>[:<port>]/<dbname>')
df = pd.read_sql_query('SELECT * FROM table', engine)
df.head()
@stefanthoss
stefanthoss / cik_dict.py
Last active Jul 24, 2019 — forked from dougvk/cik_dict.py
(stock ticker -> CIK) dictionary using SEC EDGAR using stdout
View cik_dict.py
import re
import requests
DEFAULT_TICKERS = ["BBRY", "VOD", "T", "S"]
URL = "http://www.sec.gov/cgi-bin/browse-edgar?CIK={}&Find=Search&owner=exclude&action=getcompany"
CIK_RE = re.compile(r".*CIK=(\d{10}).*")
cik_dict = {}
for ticker in DEFAULT_TICKERS:
results = CIK_RE.findall(requests.get(URL.format(ticker)).content.decode("ascii"))
You can’t perform that action at this time.