Skip to content

Instantly share code, notes, and snippets.

View stefanthoss's full-sized avatar
🏠
Working from home

Stefan Thoss stefanthoss

🏠
Working from home
View GitHub Profile
@stefanthoss
stefanthoss / freebsd-check-disk-standby-status.sh
Last active April 25, 2020 04:32
This script checks the standby status of disks in a FreeBSD system (e.g. FreeNAS). It can be used to check whether disks are spinning down properly. For each SATA and IDE hard drive, it will return either standby (0x00) or active/idle (0xff). The script needs to be run as root.
#!/usr/bin/env bash
DEVICES=$(camcontrol devlist | awk -F '[,)]' '{print $2}' | awk '/ad/{print}')
for DEV in $DEVICES; do
    printf "%s: " "$DEV"
    camcontrol epc "$DEV" -c status -P
done
@stefanthoss
stefanthoss / covid-19-bay-area-data.ipynb
Last active July 19, 2021 04:47
COVID-19 in the Bay Area. This Jupyter notebook loads the New York Times data files with cumulative counts of coronavirus cases in the United States from https://github.com/nytimes/covid-19-data and visualizes them for the San Francisco Bay Area.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@stefanthoss
stefanthoss / search-pandas-columns-for-string.py
Last active November 12, 2019 05:43
List all columns of a DataFrame that contain a certain character at least once.
import pandas as pd
# Input:
#            a     b        c    d    e
# 0  Text       Text  NaN      0.0  5
# 1  Text       NaN   1.1.1.1  0.0  55
# 2  Text.Text  Text  Text     0.4  555
data = [
    {"a": "Text", "b": "Text", "d": 0, "e": 5},
@stefanthoss
stefanthoss / find-postgres-tables-by-column.sql
Created October 9, 2019 20:30
Find PostgreSQL tables that have columns which match a certain regex.
SELECT
t.table_schema,
t.table_name,
c.column_name
FROM
information_schema.tables t
INNER JOIN information_schema.columns c ON c.table_name = t.table_name
AND c.table_schema = t.table_schema
WHERE
c.column_name ~ 'regex'
@stefanthoss
stefanthoss / advanced-dataframe-union.py
Last active January 28, 2020 19:34
Returns a new PySpark DataFrame containing the union of two DataFrames. This more advanced version works even when the two DataFrames have different columns and a different order of columns. If a column does not exist in either DataFrame, its fields will be empty.
def advanced_dataframe_union(df1, df2):
df1_fields = set((f.name, f.dataType) for f in df1.schema)
df2_fields = set((f.name, f.dataType) for f in df2.schema)
df2 = df2.select(
df2.columns
+ [
F.lit(None).cast(datatype).alias(name)
for name, datatype in df1_fields.difference(df2_fields)
]
@stefanthoss
stefanthoss / pbzip2.Dockerfile
Last active March 26, 2024 11:29
Install pbzip2 in an Alpine Linux Docker image
FROM alpine:3.10
RUN apk add --no-cache \
bzip2-dev \
g++ \
make
RUN cd /tmp/ && \
wget -q https://launchpad.net/pbzip2/1.1/1.1.13/+download/pbzip2-1.1.13.tar.gz && \
tar -xzf pbzip2-1.1.13.tar.gz && \
@stefanthoss
stefanthoss / export-pyspark-schema-to-json.py
Created June 19, 2019 22:16
Export/import a PySpark schema to/from a JSON file
import json
from pyspark.sql.types import *
# Define the schema
schema = StructType(
[StructField("name", StringType(), True), StructField("age", IntegerType(), True)]
)
# Write the schema
with open("schema.json", "w") as f:
@stefanthoss
stefanthoss / export-postgresql-table.sh
Created April 29, 2016 20:59
Linux command to export a PostgreSQL table from a remote server to a local CSV file.
psql -h hostname -U username -W -d database -t -A -F "," -c "SELECT * FROM table" > file.csv
# Explanation of the used options:
# -h Specifies the host name of the machine on which the server is running.
# -U Connect to the database as a specific user.
# -W Force psql to prompt for a password before connecting to a database.
# -d Specifies the name of the database to connect to.
# -t Turn off printing of column names and result row count footers, etc.
# -A Switches to unaligned output mode.
# -F Use separator as the field separator for unaligned output.
@stefanthoss
stefanthoss / postgres-csv-import.py
Last active May 5, 2020 05:27
Import data from a local CSV file to a PostgreSQL database table using pandas and psycopg2. 'null' values in the CSV file get replaced by real NULL values.
import pandas as pd
import numpy as np
import psycopg2
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://USER:PASSWORD@HOST:PORT/DBNAME')
df = pd.read_csv('local-file.csv', sep=',').replace(to_replace='null', value=np.NaN)
df.to_sql('dbtable', engine, schema='dbschema', if_exists='replace')
@stefanthoss
stefanthoss / git_backup_script.sh
Created March 21, 2016 21:45
Shell script to automatically commit all new/modified/deleted files in a Git repository. Can be used as a backup tool with cron.
#!/bin/bash
cd /path/to/git/repo/
git add -A
git commit -m "Backup on `date`"
git push origin