Stephan Garland stephanGarland

## pg_bench_jsonb_indices.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                stephanGarland
                / pg_bench_jsonb_indices.md
            
            
              Last active
              March 29, 2024 17:10
            
              
                Comparing Postgres access speeds on JSONB with various indices
              
          
    Environment

System


OS: Debian 12 VM on Proxmox VE 8
CPUs: 8x Xeon E5-2620 v2
RAM: 8 GiB DDR3
HDD: Samsung PM863 NVMe in Ceph over Mellanox Infiniband 56 Gbps IPoIB

Virtualized via scsi-virtio-single, aio=native, io_threads=enabled


FS: ext4 (rw,noatime)


## comparing_distinct_limit.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                stephanGarland
                / comparing_distinct_limit.md
            
            
              Created
              February 22, 2024 18:35
            
              
                Demonstrating DISTINCT with LIMIT on MySQL and Postgres
              
          
    Introduction

MySQL's documentation states:

When combining LIMIT row_count with DISTINCT, MySQL stops as soon as it finds row_count unique rows.

The question was asked, how is this deterministic / accurate? My hypothesis is that it's due to MySQL's use of a clustering
index, as opposed to Postgres' heap storage. I think that, given a monotonic PK such as an AUTO_INCREMENT (or perhaps
any index), it's able to use that to guarantee determinism.

  
## postgres_clustering.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                stephanGarland
                / postgres_clustering.md
            
            
              Created
              February 9, 2024 23:37
            
              
                Demonstrating Postgres' clustering operation
              
          
    This table has 1,000,000 rows, consisting of a UUIDv4 PK, a random int of range (1,1000000), and ~1 KiB of Lorem Ipsum text.
postgres=# \d+ uuid_pk
                                          Table "public.uuid_pk"
 Column  |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
---------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 id      | uuid    |           | not null |         | plain    |             |              |
 user_id | integer |           | not null |         | plain    |             |              |
 lorem   | text    |           | not null |         | extended |             |              |


## tree_insert.sql
SET search_path TO 'example';

INSERT INTO genre (name) VALUES ('Rock'), ('Classic Rock'), ('Metal'), ('Progressive Metal'), ('Arena Rock'), ('Alternative Metal'), ('Grunge');
INSERT INTO category (name) VALUES ('Book'), ('Music');
INSERT INTO format (name) VALUES ('Compact Disc'), ('Vinyl');

INSERT INTO band (year_formed, name) VALUES (1985, 'Majesty'), (1988, 'Dream Theater'), (1990, 'Tool'), (1970, 'Queen'), (1987, 'Alice in Chains');

INSERT INTO artist (first_name, last_name) VALUES ('John', 'Petrucci'), ('John', 'Myung'), ('James', 'LaBrie'), ('Jordan', 'Ruddess'), ('Mike', 'Portnoy'), ('Mike', 'Mangini');
INSERT INTO artist (first_name, last_name, prefix, suffix) VALUES ('Brian', 'May', 'Sir', 'CBE');

## calc_innodb_buf_pool.py
#!/usr/bin/env python3

"""
Calculates various parameters for the InnoDB buffer pool based on a simple input.

The InnoDB engine is extremely strict about how its buffer pool can be sized.
Ref: https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_buffer_pool_chunk_size
Other than size restrictions (which are unlikely to be violated in normal practice), there are
complicated relationships between the three main variables that must be met, plus some
best practices to be observed. Getting these three correct can be challenging, hence this calculator.

## mysql_binary_truncation.sql
mysql> CREATE TABLE foo(id TINYINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, bar BINARY(16), baz CHAR(36));
Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9-a129-11ee-95fb-0242ac110000');
Query OK, 1 row affected (0.02 sec)

mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9a12911ee95fb0242ac110000');
Query OK, 1 row affected (0.01 sec)

mysql> SHOW BINARY LOGS;

## indexing_bools_low_cardinality.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                stephanGarland
                / indexing_bools_low_cardinality.md
            
            
              Created
              October 5, 2023 11:40
            
              
                Demonstrating how indexing booleans is often not helpful
              
          
    System Specifications


OS: Debian Bullseye 5.10.0-23-amd64
Virtualized: Yes (Proxmox)
CPU: E5-2650 v2 @ 2.60GHz
Allocated Core Count: 16
Allocated RAM: 64 GiB PC3-12800R
Disk: Samsung PM983 1.92 TiB via Ceph
Filesystem: XFS
Mount Options: defaults,noatime
Postgres Version: 15.3


## bench_mysql_innodb_myisam.py
import csv
import json
import os
import pymysql
import statistics
import time


def prepare_values(row):
    new_row = []

## bench_mysql_innodb_myisam.sh
#!/usr/bin/env bash

function create_tables() {
    tbl_1=$(cat <<EOF
    CREATE TABLE IF NOT EXISTS test_innodb (
        user_id CHAR(32) NOT NULL PRIMARY KEY,
        user_email VARCHAR(254) NOT NULL,
        created_at DATETIME NOT NULL,
        tags JSON NOT NULL DEFAULT ('{}'),
        shared_with JSON NOT NULL DEFAULT ('{}'),

## mkjup.sh
# Requirements:
# jupyter-notebook: https://jupyter.org/install
# Python3: https://www.python.org/downloads
# virtualenv: https://pypi.org/project/virtualenv
# virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
# zsh
# Tested with Jupyter Notebook 6.5.3, Python 3.11.4, and zsh 5.8

mkjup() {
  # This is a base64-encoded skeleton of an .ipynb file, with two variables which are filled via envsubst later
	SET search_path TO 'example';

	INSERT INTO genre (name) VALUES ('Rock'), ('Classic Rock'), ('Metal'), ('Progressive Metal'), ('Arena Rock'), ('Alternative Metal'), ('Grunge');
	INSERT INTO category (name) VALUES ('Book'), ('Music');
	INSERT INTO format (name) VALUES ('Compact Disc'), ('Vinyl');

	INSERT INTO band (year_formed, name) VALUES (1985, 'Majesty'), (1988, 'Dream Theater'), (1990, 'Tool'), (1970, 'Queen'), (1987, 'Alice in Chains');

	INSERT INTO artist (first_name, last_name) VALUES ('John', 'Petrucci'), ('John', 'Myung'), ('James', 'LaBrie'), ('Jordan', 'Ruddess'), ('Mike', 'Portnoy'), ('Mike', 'Mangini');
	INSERT INTO artist (first_name, last_name, prefix, suffix) VALUES ('Brian', 'May', 'Sir', 'CBE');
	#!/usr/bin/env python3

	"""
	Calculates various parameters for the InnoDB buffer pool based on a simple input.

	The InnoDB engine is extremely strict about how its buffer pool can be sized.
	Ref: https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_buffer_pool_chunk_size
	Other than size restrictions (which are unlikely to be violated in normal practice), there are
	complicated relationships between the three main variables that must be met, plus some
	best practices to be observed. Getting these three correct can be challenging, hence this calculator.
	mysql> CREATE TABLE foo(id TINYINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, bar BINARY(16), baz CHAR(36));
	Query OK, 0 rows affected (0.03 sec)

	mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9-a129-11ee-95fb-0242ac110000');
	Query OK, 1 row affected (0.02 sec)

	mysql> INSERT INTO foo (bar, baz) VALUES (UUID_TO_BIN('dc23a9b9-a129-11ee-95fb-0242ac110000'), 'dc23a9b9a12911ee95fb0242ac110000');
	Query OK, 1 row affected (0.01 sec)

	mysql> SHOW BINARY LOGS;
	import csv
	import json
	import os
	import pymysql
	import statistics
	import time


	def prepare_values(row):
	new_row = []
	#!/usr/bin/env bash

	function create_tables() {
	tbl_1=$(cat <<EOF
	CREATE TABLE IF NOT EXISTS test_innodb (
	user_id CHAR(32) NOT NULL PRIMARY KEY,
	user_email VARCHAR(254) NOT NULL,
	created_at DATETIME NOT NULL,
	tags JSON NOT NULL DEFAULT ('{}'),
	shared_with JSON NOT NULL DEFAULT ('{}'),
	# Requirements:
	# jupyter-notebook: https://jupyter.org/install
	# Python3: https://www.python.org/downloads
	# virtualenv: https://pypi.org/project/virtualenv
	# virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/latest/install.html
	# zsh
	# Tested with Jupyter Notebook 6.5.3, Python 3.11.4, and zsh 5.8

	mkjup() {
	# This is a base64-encoded skeleton of an .ipynb file, with two variables which are filled via envsubst later