saurabhnanda/tuning-postgres-zfs.md

## tuning-postgres-zfs.md

      
    Raw
  

              tuning-postgres-zfs.md
            
          
    Tuning ZFS + Postgres to outperform EXT4 + Postgres

Please refer to ZFS 2-3x slower than EXT4 to see how ZFS defaults + Postgres defaults severely underperform EXT4 defaults + Postgres defaults (and also to know more about the system on which these benchmarks were performed). This page documents how to tune ZFS + Postgres to give better performance for the tpcb-like benchmark.
BIG FAT WARNING

Please do not copy these settings blindly because I am myself not clear on why/how these settings had the impact they did. For example, I cannot explain why full_page_writes=off independently did not give that much boost, nor did an optimized PG configuration. However, putting both of them together gave a 2-4x boost compared to baseline numbers.
Benchmark results

You'd want to compare the numbers on the first row and the last row to get a sense of the maximum performance gains that I could achieve.
+------------------------+-------+--------+--------+--------+--------+--------+--------+
| client                 | 1     | 4      | 8      | 12     | 24     | 48     | 96     |
+------------------------+-------+--------+--------+--------+--------+--------+--------+
| EXT4 defaults          | 719   | 1816   | 2835   | 3576   | 4531   | 5485   | 6876   |
| cold (warm)            | (763) | (1982) | (3286) | (3889) | (5071) | (5604) | (6540) |
+------------------------+-------+--------+--------+--------+--------+--------+--------+
| ZFS benchmarks                                                                       |
+--------------------------------------------------------------------------------------+
| PG defaults            | 302   | 808    | 1138   | 1287   | 1619   | 2575   | 4040   |
| cold (warm)            | (309) | (836)  | (1140) | (1307) | (1651) | (2621) | (4301) |
+------------------------+-------+--------+--------+--------+--------+--------+--------+
| Optimized PG conf [1]  | 327   | 1365   | 2291   |        |        |        |        |
| full_page_write=on     | (334) | (1331) | (2049) |        |        |        |        |
| cold (warm)            |       |        |        |        |        |        |        |
+------------------------+-------+--------+--------+--------+--------+--------+--------+
| PG defaults            |       | 878    |        | 1334   |        |        |        |
| + kernel tuning [2]    |       | (882)  |        | (1363) |        |        |        |
| cold (warm)            |       |        |        |        |        |        |        |
+------------------------+-------+--------+--------+--------+--------+--------+--------+
| PG defaults            |       | 957    |        | 1372   | 2130   |        |        |
| + full_page_writes=off |       | (959)  |        | (1399) | (2154) |        |        |
| + kernel tuning [2]    |       |        |        |        |        |        |        |
| cold (warm)            |       |        |        |        |        |        |        |
+------------------------+-------+--------+--------+--------+--------+--------+--------+
| Optimized PG conf [1]  |       | 2335   |        | 4751   | 6969   | 7627   | 8429   |
| + full_page_writes=off |       | (2488) |        | (5072) | (5667) | (6485) | (7720) |
| + kernel tuning [2]    |       |        |        |        |        |        |        |
| cold (warm)            |       |        |        |        |        |        |        |
+------------------------+-------+--------+--------+--------+--------+--------+--------+

Explanation of the benchmark results

ZFS creation and settings

zpool create -f -m /firstzfs firstzfs -o ashift=12 /dev/nvme1n1p2

zfs create firstzfs/postgres \
  -o mountpoint=/firstzfs/postgres \
  -o atime=off \
  -o canmount=on \
  -o compression=lz4 \
  -o quota=100G \
  -o recordsize=8k \
  -o logbias=throughput

Optimized PG configuration


Give more memory to PG for shared_buffers
Tweak the IO parameters for SSDs

max_connections = 100
shared_buffers = 16GB
effective_cache_size = 48GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 41943kB
min_wal_size = 1GB
max_wal_size = 2GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

Disabling full page writes on Postgres

To be honest, I don't completely understand what this really does, but apparently it has a big positive impact for Postgres + ZFS.
full_page_writes=off

Kernel tuning


Set CPU governor to performance. I noticed that multiple machines had this set to powersave instead, which keeps reducing the CPU speed in periods of inactivity, and increases the CPU speed, when required, but only after a slight delay. In an old benchmark (not reported above), I think I saw an improvement of ~5% due to this tweak alone.

cpufreq-set -g performance


Change drive read-ahead to 4096. Warning: This is recommended in the book PostgeSQL 10 - High Performance, but I am not sure how relevant this is for ZFS, because ZFS has about 10+ (really!) tunable parameters for prefetching of data.

blockdev --setra 4096 /dev/sda


Change swapiness. Caveat: I haven't even configured swap on the benchmarking system, but I did this anyway, for sake of completeness. Again, another recommendation from PostgreSQL 10 - High Performance

echo 0 > /proc/sys/vm/swappiness