gnat/postgres_portable_no_install.md

## postgres_portable_no_install.md

      
    Raw
  

              postgres_portable_no_install.md
            
          
    🐘 Postgres Standalone

Why?

Localize your database into one single folder of your choosing.
No sudo / root requirement.
Run multiple Postgres at the same time on the same machine- Full bare metal performance.
Postgres is a great starter before moving to "slow single machine but horizontal scale": CockroachDB, ScyllaDB, etc.

Compile

Sadly, as of this writing there is no suitable binary distribution, but this works well.

Get source: https://www.postgresql.org/ftp/source/

Probably needed on ubuntu: sudo apt install libreadline-dev


./configure --prefix=/home/$USER/Desktop/postgres
make -j32
make install

Run


cd ~/Desktop/postgres
./bin/initdb -D ./data

If desired: ./bin/initdb -D ./data -U $USER --locale=en_US.UTF-8


Edit (increase limits): ./data/postgresql.conf

max_connections = 10000
work_mem = 64MB
shared_buffers = 10GB
max_wal_size = 100GB
autovacuum_max_workers = 10
autovacuum_vacuum_scale_factor = 0.01


ulimit -n 100000
Start postgres daemon: ./bin/pg_ctl -D ./data -l ./logs start
Create new database: ./bin/createdb test -h 127.0.0.1 -U $USER
Login to Postgres: ./bin/psql test -h 127.0.0.1 -U $USER

Remote Access


Edit ./data/postgresql.conf

listen_addresses = '*'


Edit ./data/pg_hba.conf

LAN only: host all all 10.0.0.0/24 trust
Anyone (use firewall): host all all 0.0.0.0/0 trust


Restart postgres (required): ./bin/pg_ctl -D ./data -l ./logs restart

Usage

🔵 psql Quickstart


./bin/psql DATABASE -h 127.0.0.1
\q Quit
\? Help
\l Display databases (🐬 SHOW DATABASES;)
\d Display tables (🐬 SHOW TABLES;)
\d test Display table schema (🐬 SHOW CREATE TABLE test;)
\du Display users (🐬 SHOW USERS;)
SELECT * FROM pg_stat_activity; (🐬 SHOW PROCESSLIST;)

Prevent pager from folding lines: \setenv PAGER 'less -S'
MySQL ➡️ Postgres

🔵 psql
⚪ SQL
🟢 pg_dump
Currently, these DDL and admin commands in Postgres are a scattered mess. CockroachDB aliases all of these- Postgres is just behind here.

🐬 SHOW DATABASES;

🔵 \l ⚪ SELECT DISTINCT table_schema FROM information_schema.tables;


🐬 SHOW TABLES;

🔵 \d ⚪ SELECT * FROM information_schema.tables WHERE table_schema = 'public';


🐬 SHOW CREATE TABLE test;

🔵 \d test ⚪ SELECT * FROM information_schema.columns WHERE table_name = 'test'; ... total mess. 🙄
🟢 ./bin/pg_dump test -h 127.0.0.1 -p 5432 --schema-only -t test ...great guys, great DX. 🙄
⚪ select master_get_table_ddl_events('test'); Good, but Citus only!

See bottom of this page for select show_create_table('test');


💬 Thread to get decent SHOW CREATE TABLE, DDL features


🐬 SHOW USERS;

🔵 \du ⚪ SELECT * FROM pg_user;


🐬 SHOW PROCESSLIST;

⚪ SELECT * FROM pg_stat_activity;


🐬 MySQL backticks:$$string$$ Could be worse...

Case insensitive and accent insensitive collations do not exist currently in Postgres, but you can alias unaccent(lower('blah')) to flat('blah') and use a computed column or index. Unfortunate because you have to manage these indexes / columns yourself, when in MySQL it's automatic.
Maintenence and Health


📦 Backup: ./bin/pg_basebackup -h 127.0.0.1 -D ./backup -P --checkpoint=fast

Verify backup: ./bin/pg_verifybackup ./backup You'll want to actually test them once in awhile too.
S3 or cluster? https://github.com/wal-g/wal-g
Replication: https://www.youtube.com/watch?v=GIsD1BgFnWc

HA / High Availability: how governor / patroni works
👁️ Monitor / Observability: pg_stat_statements, pgmetrics, pgdash
PgCat: https://postgresml.org/blog/scaling-postgresml-to-1-million-requests-per-second
Citus: https://gigaom.com/report/transaction-processing-price-performance-testing/

Remember though, Citus is enjoying only 1x replication when the others are 3x.


⭐ Restore: The file created by pg_basebackup is just your ./data directory.
⌛ Schedule jobs: https://github.com/citusdata/pg_cron
⏫ Upgrades: https://www.postgresql.org/docs/current/pgupgrade.html

⭐ pg_upgrade —link is the fastest. No pg_dump.
Online upgrades require logical replication. See: reference


Stop old db
Rename datadir to datadir_old
mkdir datadir
pg_upgrade .... --old-datadir datadir_old --new-datadir datadir


🚚 Export (too slow for backup, but useful to migrate to another database): ./bin/pg_dump test -h 127.0.0.1 >backup.sql

https://gist.github.com/linuxkathirvel/90771e9d658195fa59e0f0b921f7e22e


♻️ Check dead tuples: select n_live_tup, n_dead_tup, relname from pg_stat_all_tables;

Shouldn't normally need to do this if vacuum is set up correctly.
Easiest way to see if your vacuum settings are keeping up!


Replication


Easy replication with DDL: https://github.com/xataio/pgstream
Easy streaming replication
You ideally want LOGICAL replication: resistant to failures and WAL corruption.

Better to lose a row than invalidate the database from 1 corrupt row.
Starting to get there in Postgres 16.

⛔ Still doesn't replicate DDL / Schema changes in 16! WTF. No replication of migrations.

⚠️ The initial schema can be copied by hand using pg_dump --schema-only Less annoying but still bad DX.
⚠️ Also does not replicate TRUNCATE, large objects, sequences. Less annoying but still bad DX.
https://www.postgresql.org/docs/current/logical-replication-restrictions.html


⛔ With Postgres shipping WAL, if WAL corrupts on the master, that corruption goes straight to all the replicas. Take frequent backups.

https://www.uber.com/en-CA/blog/postgres-to-mysql-migration/ See: https://gist.github.com/sebastianwebber/5b67fb2866dbc300dab225ded5f28618
https://lwn.net/Articles/696471/


Comparison: https://postgresql.org/docs/current/different-replication-solutions.html

🌗 pgpool may be the only good answer for logical replication. Obviously, avoid SQL like NOW() as that will not replicate properly.
🌗 Could go CockroachDB the moment you need to do replication? 10x hardware though...


Scale


Postgres is less suitable for heavy writes using UPDATE, particularly with lots of indexes.

Every write = new row until VACUUM
⛔ All indexes are re-written on UPDATE because the ctid changes (internal id). WTF. Write amplification!

Even if you do not update the index, it must still create a new row per index for the ctid!


Write amplification: https://www.uber.com/en-CA/blog/postgres-to-mysql-migration/
Example workload: Tracking car location for Uber. Real time data.
👀 OrioleDB and InnoDB (MySQL, MariaDB) avoids this using an undo log only and does not have ctid


Sharding: https://github.com/citusdata/citus (formerly pg_shard)
Compatibility with CockroachDB, Spanner (and BigTable derivatives).

Use BIGINT or UUID for primary keys and indexes.

Do not use SERIAL. Do not use AUTO INCREMENT. https://gist.github.com/gnat/774bc540c0b300146cdedb0744bad081


Use TEXT (for STRING for char data).
Use JSON (for JSONB for table in table).
Types

Postgres: https://www.postgresql.org/docs/current/datatype.html
CockroachDB: https://www.cockroachlabs.com/docs/stable/data-types.html
Spanner: https://cloud.google.com/spanner/docs/migrating-postgres-spanner


In depth comparison: https://wiki.postgresql.org/wiki/Scaleout_Design
More or less typical upgrade process for Postgres cluster (Ayden) super complex but it works: https://www.adyen.com/blog/updating-a-50-terabyte-postgresql-database

🙄 App engineered to be able to continue to buffer transactions during upgrades. FFS.


Performance


Must read (MVCC): https://ottertune.com/blog/the-part-of-postgresql-we-hate-the-most/
Must read (tuning): https://www.citusdata.com/blog/2016/11/04/autovacuum-not-the-enemy/
Must read: https://www.enterprisedb.com/blog/tuning-maxwalsize-postgresql
Adjusting autovacuum settings are important because:

Postgres is copy on write.

A modified record is copied to a new page when written, and the old record left in place till a vacuum.
(aka write amplification on UPDATE)


Postgres must update every index if a change to the row updates an index.
All databases based on Posgres are effected by this: RDS, Aurora, or Aurora Serverless


Postgres doesn't automatically create indexes for foreign keys. Only on MySQL InnoDB.

Extended Author Notes

Reasons you might want to just use MySQL / MariaDB


Thread: https://news.ycombinator.com/item?id=35918886
Present Issues.

UPDATE or DELETE will create new rows until VACUUM is run (classic table bloat issue, see Uber)
Replication: Logical replication does not replicate schema changes (ALTER, CREATE)
Need to monkey with user-defined CREATE COLLATION to get case-insensitivty and accent-insensitivty collations. MySQL has this set up default.
Permission management is a lot less easy vs MySQL.
DX: Ease-of-use commands like SHOW CREATE TABLE
DX: column re-ordering. Not possible in the current Postgres engine.
More table locking when ALTERing. Stuff that blocks in Postgres:

Haven't tested: PK range scan queries are much slower apparently? (InnoDB uses a clustered index).
Haven't tested: Unexpectedly negative query plan adjustments without pro-active hint usage apparently? 1.


No longer a big deal.

Can only reasonably upgrade between minor versions without pg_dump.

Possible solution ?


Handling several thousand connections per second without needing a proxy or pool (MySQL's connection model is thread/conn vs Postgres using process/conn)

Not a problem in Postgres 15+. You won't need a proxy until you hit large scale.


Handle very high-volume OLTP workloads using direct I/O, since InnoDB's buffer pool design is completely independent of filesystem/OS caching

Not sure if this is a big issue Postgres 15+ ?


Achieve best-in-industry compression by using the MyRocks storage engine (MySQL's pluggable storage engine design has a lot of tradeoffs but it is inherently what makes this even possible)

Cool but not a game changer.


Use UNSIGNED int types, to store twice as high max value in the same number of bytes, if you know negative numbers are not going to be present

Cool but not a game changer. Stay away from UNSIGNED for cross-database compatibility.


A tooling ecosystem which includes multiple battle-tested external online schema change tools, for safely making alterations of any type to tables with billions of rows

Cool but not a game changer.


Citus


Get source: https://github.com/citusdata/citus

Probably needed on ubuntu: sudo apt install libcurl4-openssl-dev liblz4-dev libzstd-dev


export PG_CONFIG=/home/$USER/Desktop/postgres/bin/pg_config; ./configure --prefix=/home/$USER/Desktop/citus

Adjust these accordingly!


make -j32
make install
Finish Run step at the top of this page before continuing! Then:
echo "shared_preload_libraries = 'citus'" >> /home/$USER/Desktop/postgres/data/postgresql.conf
echo "wal_level = 'logical'" >> /home/$USER/Desktop/postgres/data/postgresql.conf

....

CREATE EXTENSION citus;
SELECT citus_version();

SHOW CREATE TABLE with Citus

Use
select master_get_table_ddl_events('test');
Or
CREATE OR REPLACE FUNCTION pg_catalog.show_create_table(text) RETURNS SETOF text LANGUAGE c STRICT ROWS 100 AS '$libdir/citus', $function$master_get_table_ddl_events$function$;
-- Lets you run...
SELECT show_create_table('test');