Skip to content

Instantly share code, notes, and snippets.

View stephanGarland's full-sized avatar

Stephan Garland stephanGarland

View GitHub Profile
@stephanGarland
stephanGarland / zfs_file_life.txt
Last active April 1, 2023 15:30
Examining ZFS storage of files and snapshots with ZDB
# create a simple dataset with a small record size, and chown it
❯ sudo zfs create -o recordsize=512 tank/foobar && sudo chown $YOUR_USER:$YOUR_GROUP tank/foobar
❯ cd tank/foobar
# make a 1K file filled with hex FF (pull from /dev/zero, then use tr to translate to FF, which is 377 in octal)
# if it's just zeros, there isn't much to look at with zdb
❯ dd if=/dev/zero bs=1k count=1 | tr "\000" "\377" >file.txt
1+0 records in
1+0 records out

I ran some experiments with varying recordsizes, filesizes, and compression. The files were .csv, representative of a simple schema:

full_name,external_id,last_modified
'Past, Gabrielle',40605,'2006-07-09 23:17:20' 
'Vachil, Corry',44277,'1996-09-05 05:12:44'

The files were all generated on an ext4 filesystem. There were three sets of five files, with 75, 100,000, and 1,000,000 rows each, resulting in the following sizes:

❯ find . -name '*small*.csv' -exec du -bc {} + | \

awk 'END {printf "%s %.2f %s\n", "Average file size:", ($1 / (NR-1) / 1024), "KiB"}'

@stephanGarland
stephanGarland / mkjup.sh
Last active July 1, 2023 01:39
A zsh function to create an ephemeral or persistent venv and and Jupyter Notebook kernel.
# Requirements:
# jupyter-notebook: https://jupyter.org/install
# Python3: https://www.python.org/downloads
# virtualenv: https://pypi.org/project/virtualenv
# virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
# zsh
# Tested with Jupyter Notebook 6.5.3, Python 3.11.4, and zsh 5.8
mkjup() {
# This is a base64-encoded skeleton of an .ipynb file, with two variables which are filled via envsubst later
@stephanGarland
stephanGarland / bench_mysql_innodb_myisam.sh
Created September 30, 2023 18:06
Benchmarking MySQL bulk load into InnoDB and MyISAM
#!/usr/bin/env bash
function create_tables() {
tbl_1=$(cat <<EOF
CREATE TABLE IF NOT EXISTS test_innodb (
user_id CHAR(32) NOT NULL PRIMARY KEY,
user_email VARCHAR(254) NOT NULL,
created_at DATETIME NOT NULL,
tags JSON NOT NULL DEFAULT ('{}'),
shared_with JSON NOT NULL DEFAULT ('{}'),
@stephanGarland
stephanGarland / bench_mysql_innodb_myisam.py
Last active September 30, 2023 23:31
Benchmarking MySQL INSERTs into InnoDB and MyISAM
import csv
import json
import os
import pymysql
import statistics
import time
def prepare_values(row):
new_row = []
@stephanGarland
stephanGarland / indexing_bools_low_cardinality.md
Created October 5, 2023 11:40
Demonstrating how indexing booleans is often not helpful

System Specifications

  • OS: Debian Bullseye 5.10.0-23-amd64
  • Virtualized: Yes (Proxmox)
  • CPU: E5-2650 v2 @ 2.60GHz
  • Allocated Core Count: 16
  • Allocated RAM: 64 GiB PC3-12800R
  • Disk: Samsung PM983 1.92 TiB via Ceph
  • Filesystem: XFS
  • Mount Options: defaults,noatime
  • Postgres Version: 15.3
@stephanGarland
stephanGarland / mysql_binary_truncation.sql
Created December 23, 2023 00:35
Testing silent truncation of zeros in BINARY columns in MySQL
mysql> CREATE TABLE foo(id TINYINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, bar BINARY(16), baz CHAR(36));
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9-a129-11ee-95fb-0242ac110000');
Query OK, 1 row affected (0.02 sec)
mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9a12911ee95fb0242ac110000');
Query OK, 1 row affected (0.01 sec)
mysql> SHOW BINARY LOGS;
@stephanGarland
stephanGarland / calc_innodb_buf_pool.py
Last active April 19, 2024 11:53
Designed for use with Ansible, but works on its own as well – calculate the tricky parts of InnoDB's buffer pool
#!/usr/bin/env python3
"""
Calculates various parameters for the InnoDB buffer pool based on a simple input.
The InnoDB engine is extremely strict about how its buffer pool can be sized.
Ref: https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_buffer_pool_chunk_size
Other than size restrictions (which are unlikely to be violated in normal practice), there are
complicated relationships between the three main variables that must be met, plus some
best practices to be observed. Getting these three correct can be challenging, hence this calculator.
@stephanGarland
stephanGarland / tree_insert.sql
Created February 6, 2024 22:13
Example of a tree-like structure in SQL (specifically Postgres)
SET search_path TO 'example';
INSERT INTO genre (name) VALUES ('Rock'), ('Classic Rock'), ('Metal'), ('Progressive Metal'), ('Arena Rock'), ('Alternative Metal'), ('Grunge');
INSERT INTO category (name) VALUES ('Book'), ('Music');
INSERT INTO format (name) VALUES ('Compact Disc'), ('Vinyl');
INSERT INTO band (year_formed, name) VALUES (1985, 'Majesty'), (1988, 'Dream Theater'), (1990, 'Tool'), (1970, 'Queen'), (1987, 'Alice in Chains');
INSERT INTO artist (first_name, last_name) VALUES ('John', 'Petrucci'), ('John', 'Myung'), ('James', 'LaBrie'), ('Jordan', 'Ruddess'), ('Mike', 'Portnoy'), ('Mike', 'Mangini');
INSERT INTO artist (first_name, last_name, prefix, suffix) VALUES ('Brian', 'May', 'Sir', 'CBE');
@stephanGarland
stephanGarland / postgres_clustering.md
Created February 9, 2024 23:37
Demonstrating Postgres' clustering operation

This table has 1,000,000 rows, consisting of a UUIDv4 PK, a random int of range (1,1000000), and ~1 KiB of Lorem Ipsum text.

postgres=# \d+ uuid_pk
                                          Table "public.uuid_pk"
 Column  |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
---------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 id      | uuid    |           | not null |         | plain    |             |              |
 user_id | integer |           | not null |         | plain    |             |              |
 lorem   | text    |           | not null |         | extended |             |              |