Skip to content

Instantly share code, notes, and snippets.

@saurabhnanda
Last active March 10, 2024 09:36
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save saurabhnanda/6720c13fe02c4ffd950c80f41091e199 to your computer and use it in GitHub Desktop.
Save saurabhnanda/6720c13fe02c4ffd950c80f41091e199 to your computer and use it in GitHub Desktop.
ZFS underperforms EXT4 significantly

ZFS 2-3x slower than EXT4

Table of contents

Benchmark results for Postgres on ZFS vs EXT4

+--------+-----------------------------+-----------------------------+-----------------------------+----------------+
|        |       Postgres on ZFS       |       Postgres on EXT4      |       Postgres on ZFS       | ZFS Slowdown   |
|        |           0.7.5-1           |                             |           0.7.12-1          |                |
|        |    (Ubuntu 18.04 default)   |                             |    (compiled from source)   | 0.7.12 vs EXT4 |
+--------+-----------------------------+-----------------------------+-----------------------------+                +
| client | First run    | Second run   | First run    | Second run   | First run    | Second run   |                |
|        | (cold cache) | (warm cache) | (cold cache) | (warm cache) | (cold cache) | (warm cache) |                |
+--------+--------------+--------------+--------------+--------------+--------------+--------------+----------------+
| 1      | 393          | 386          | 719          | 763          | 302          | 309          | 2.5x slower    |
+--------+--------------+--------------+--------------+--------------+--------------+--------------+----------------+
| 4      | 844          | 867          | 1816         | 1982         | 807          | 836          | 2.4x slower    |
+--------+--------------+--------------+--------------+--------------+--------------+--------------+----------------+
| 8      | 1142         | 1168         | 2835         | 3286         | 1138         | 1140         | 2.8x slower    |
+--------+--------------+--------------+--------------+--------------+--------------+--------------+----------------+
| 12     | 1275         | 1296         | 3576         | 3889         | 1288         | 1307         | 3.0x slower    |
+--------+--------------+--------------+--------------+--------------+--------------+--------------+----------------+
| 24     | 1603         | 1671         | 4531         | 5071         | 1618         | 1651         | 3.1x slower    |
+--------+--------------+--------------+--------------+--------------+--------------+--------------+----------------+
| 48     | 2690         | 2687         | 5485         | 5604         | 2574         | 2621         | 2.1x slower    |
+--------+--------------+--------------+--------------+--------------+--------------+--------------+----------------+
| 96     | 3676         | 3832         | 6876         | 6540         | 4040         | 4301         | 1.5x slower    |
+--------+--------------+--------------+--------------+--------------+--------------+--------------+----------------+

System information

Output of uname

Linux pxnvme 4.15.0-38-generic #41-Ubuntu SMP Wed Oct 10 10:59:38 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Contents of /etc/lsb-release

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"

Postgres version

benchmarking=# select version();
                                                             version                                                              
----------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.1 (Ubuntu 11.1-3.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0, 64-bit
(1 row)

ZFS version

Benchmarks were run on two ZFS versions:

  • 0.7.5-1ubuntu16.4 which ships with Ubuntu 18.04
  • 0.7.12-1 which was installed manually

How ZFS 0.7.12-1 was installed

  • Install various dependencies required to compile ZFS from source
apt-get install build-essential zlib1g zlib1g-dev libuuid1 uuid-dev libblkid1 libblkid-dev
  • Install spl-0.7.12 from source
cd /tmp && \
wget https://github.com/zfsonlinux/zfs/releases/download/zfs-0.7.12/spl-0.7.12.tar.gz && \
tar -zxf spl-0.7.12.tar.gz && \
cd /tmp/spl-0.7.12  && \
./configure && \
make && \
make install
  • Install zfs-0.7.12 from source
cd /tmp && \
wget https://github.com/zfsonlinux/zfs/releases/download/zfs-0.7.12/zfs-0.7.12.tar.gz && \
tar -zxf zfs-0.7.12.tar.gz && \
cd /tmp/zfs-0.7.12  && \
./configure && \
make && \
make install
  • Change depmod configuration to pick up the new zfs/spl modules

PS: By default modprobe is going to pick up modules from /lib/modules/4.15.0-38-generic/kernel and doesn't even know about the existence of /lib/modules/4.15.0-38-generic/extra where ZFS installs the new modules.

echo "search extra updates ubuntu built-in" > /etc/depmod.d/ubuntu.conf
depmod
  • Reboot the machine and verify that we have the new modules installed
# cat /sys/module/zfs/version
0.7.12-1

# cat /sys/module/spl/version
0.7.12-1

fdisk output

# fdisk -l /dev/nvme1n1
Disk /dev/nvme1n1: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x7e7ea303

Device         Boot     Start       End   Sectors  Size Id Type
/dev/nvme1n1p1           2048 419432447 419430400  200G 83 Linux        <==== EXT4 filesystem
/dev/nvme1n1p2      419432448 838862847 419430400  200G 83 Linux        <==== ZFS filesystem

creating the filesystems

Creating & mounting EXT4

mkfs.ext4 /dev/nvme1n1p1 
mount -o noatime,data=ordered /dev/nvme1n1p1  /pg-ext4

Creating ZFS

zpool create -f -m /firstzfs firstzfs -o ashift=12 /dev/nvme1n1p2

zfs create firstzfs/postgres \
  -o mountpoint=/firstzfs/postgres \
  -o atime=off \
  -o canmount=on \
  -o compression=lz4 \
  -o quota=100G \
  -o recordsize=8k \
  -o logbias=throughput

Creating the two databases and benchmarking user

pg_createcluster -d /pg-ext4/11/ext4 -p 5433 -e UTF8 11 ext4
pg_createcluster -d /firstzfs/postgres/11/zfs -p 5434 -e UTF8 11 zfs

createuser -p 5433 -s -l -P benchmarking
PGPASSWORD="benchmarking" createdb -U benchmarking -h 127.0.0.1 -p 5433 benchmarking
createuser -p 5434 -s -l -P benchmarking
PGPASSWORD="benchmarking" createdb -U benchmarking -h 127.0.0.1 -p 5434 benchmarking

Note: Only one PG cluster at a time was running. All others were shut down.

Common benchmarking script

  • To ensure that benchmarks for different concurrency levels do not interfere with each other, the benchmarking database is being re-created for each concurrency level.
  • First run is with a cold cache
  • Second run is with a warm cache
#!/bin/bash

for script in tpcb-like; do
  for client in 1 4 8 12 24 48 96; do
    PGPASSWORD="benchmarking" pgbench \
      --initialize \
      --init-steps=dtgpf \
      --scale=2000 \
      --host=127.0.0.1 \
      --port=$PORT \
      --username=benchmarking

    echo "run=nocache | script=$script | client=$client" >> output-$script-nocache.txt
    sync
    echo 3 > /proc/sys/vm/drop_caches
    PGPASSWORD="benchmarking" pgbench \
      --builtin=$script \
      --client=$client \
      --jobs=6 \
      --time=600 \
      --host=127.0.0.1 \
      --port=$PORT \
      --username=benchmarking >> output-$script-$client-nocache.txt
      
      
    echo "run=withcache | script=$script | client=$client" >> output-$script-withcache.txt
    PGPASSWORD="benchmarking" pgbench \
      --builtin=$script \
      --client=$client \
      --jobs=6 \
      --time=600 \
      --host=127.0.0.1 \
      --port=$PORT \
      --username=benchmarking >> output-$script-$client-withcache.txt

  done
done

Important DB settings common for all benchmarks

postgres=# select '    ' || name || '=' || setting || ' ' || coalesce(unit, '') from pg_settings where name in ('checkpoint_completion_target', 'default_statistics_target', 'effective_io_concurrency', 'max_parallel_workers', 'max_parallel_workers_per_gather', 'max_wal_size', 'max_worker_processes', 'min_wal_size', 'random_page_cost', 'shared_buffers', 'wal_buffers', 'work_mem', 'huge_pages', 'max_connection') order by name;
                ?column?
----------------------------------------
     checkpoint_completion_target=0.5
     default_statistics_target=100
     effective_io_concurrency=1
     huge_pages=try   
     max_parallel_workers=8
     max_parallel_workers_per_gather=2
     max_wal_size=1024 MB
     max_worker_processes=8
     min_wal_size=80 MB
     random_page_cost=4
     shared_buffers=16384 8kB
     wal_buffers=512 8kB
     work_mem=4096 kB

System hardware

Hardware summary

Actual machine - https://www.hetzner.com/dedicated-rootserver/px61-nvme

  • two 512 GB NVMe Gen3 x4 SSDs (Toshiba)
  • Intel® Xeon® E3-1275 v5 Quad-Core Skylake processor -- 4c/8t
  • 64 GB DDR4 ECC RAM.

Output of lshw

pxnvme                      
    description: Desktop Computer
    vendor: FUJITSU
    width: 64 bits
    capabilities: smbios-3.0 dmi-3.0 smp vsyscall32
    configuration: administrator_password=disabled boot=normal chassis=desktop family=CELSIUS-FTS power-on_password=disabled uuid=3D27F167-2009-2245-B638-CAAA9BF3C468
  *-core
       description: Motherboard
       product: D3417-B2
       vendor: FUJITSU
       physical id: 0
       version: S26361-D3417-B2
       serial: 56574833
     *-firmware
          description: BIOS
          vendor: FUJITSU // American Megatrends Inc.
          physical id: 0
          version: V5.0.0.12 R1.19.0.SR.1 for D3417-B2x
          date: 08/24/2018
          size: 64KiB
          capacity: 15MiB
          capabilities: pci upgrade shadowing cdboot bootselect edd int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
     *-memory
          description: System Memory
          physical id: 11
          slot: System board or motherboard
          size: 64GiB
          capabilities: ecc
          configuration: errordetection=ecc
        *-bank:0
             description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
             product: 18ASF2G72AZ-2G6D1
             vendor: Micron Technology
             physical id: 0
             serial: 1BC2771A
             slot: DIMM CHA3
             size: 16GiB
             width: 64 bits
             clock: 2667MHz (0.4ns)
        *-bank:1
             description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
             product: 18ASF2G72AZ-2G6D1
             vendor: Micron Technology
             physical id: 1
             serial: 1BC276FF
             slot: DIMM CHA1
             size: 16GiB
             width: 64 bits
             clock: 2667MHz (0.4ns)
        *-bank:2
             description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
             product: 18ASF2G72AZ-2G6D1
             vendor: Micron Technology
             physical id: 2
             serial: 1BC27DBD
             slot: DIMM CHB4
             size: 16GiB
             width: 64 bits
             clock: 2667MHz (0.4ns)
        *-bank:3
             description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
             product: 18ASF2G72AZ-2G6D1
             vendor: Micron Technology
             physical id: 3
             serial: 1BC27CE2
             slot: DIMM CHB2
             size: 16GiB
             width: 64 bits
             clock: 2667MHz (0.4ns)
     *-cache:0
          description: L1 cache
          physical id: 17
          slot: L1 Cache
          size: 256KiB
          capacity: 256KiB
          capabilities: synchronous internal write-back unified
          configuration: level=1
     *-cache:1
          description: L2 cache
          physical id: 18
          slot: L2 Cache
          size: 1MiB
          capacity: 1MiB
          capabilities: synchronous internal write-back unified
          configuration: level=2
     *-cache:2
          description: L3 cache
          physical id: 19
          slot: L3 Cache
          size: 8MiB
          capacity: 8MiB
          capabilities: synchronous internal write-back unified
          configuration: level=3
     *-cpu
          description: CPU
          product: Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz
          vendor: Intel Corp.
          physical id: 1a
          bus info: cpu@0
          version: Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz
          serial: To Be Filled By O.E.M.
          slot: CPU1
          size: 4GHz
          capacity: 4200MHz
          width: 64 bits
          clock: 100MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp flush_l1d cpufreq
          configuration: cores=4 enabledcores=4 threads=8
     *-pci
          description: Host bridge
          product: Intel Corporation
          vendor: Intel Corporation
          physical id: 100
          bus info: pci@0000:00:00.0
          version: 05
          width: 32 bits
          clock: 33MHz
          configuration: driver=ie31200_edac
          resources: irq:0
        *-display UNCLAIMED
             description: VGA compatible controller
             product: Intel Corporation
             vendor: Intel Corporation
             physical id: 2
             bus info: pci@0000:00:02.0
             version: 04
             width: 64 bits
             clock: 33MHz
             capabilities: pciexpress msi pm vga_controller bus_master cap_list
             configuration: latency=0
             resources: memory:ee000000-eeffffff memory:d0000000-dfffffff ioport:f000(size=64) memory:c0000-dffff
        *-usb
             description: USB controller
             product: Sunrise Point-H USB 3.0 xHCI Controller
             vendor: Intel Corporation
             physical id: 14
             bus info: pci@0000:00:14.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi xhci bus_master cap_list
             configuration: driver=xhci_hcd latency=0
             resources: irq:124 memory:ef220000-ef22ffff
           *-usbhost:0
                product: xHCI Host Controller
                vendor: Linux 4.15.0-38-generic xhci-hcd
                physical id: 0
                bus info: usb@1
                logical name: usb1
                version: 4.15
                capabilities: usb-2.00
                configuration: driver=hub slots=16 speed=480Mbit/s
           *-usbhost:1
                product: xHCI Host Controller
                vendor: Linux 4.15.0-38-generic xhci-hcd
                physical id: 1
                bus info: usb@2
                logical name: usb2
                version: 4.15
                capabilities: usb-3.00
                configuration: driver=hub slots=10 speed=5000Mbit/s
        *-generic
             description: Signal processing controller
             product: Sunrise Point-H Thermal subsystem
             vendor: Intel Corporation
             physical id: 14.2
             bus info: pci@0000:00:14.2
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list
             configuration: driver=intel_pch_thermal latency=0
             resources: irq:18 memory:ef23b000-ef23bfff
        *-communication:0 UNCLAIMED
             description: Communication controller
             product: Sunrise Point-H CSME HECI #1
             vendor: Intel Corporation
             physical id: 16
             bus info: pci@0000:00:16.0
             version: 31
             width: 64 bits
             clock: 33MHz
             capabilities: pm msi cap_list
             configuration: latency=0
             resources: memory:ef23a000-ef23afff
        *-communication:1
             description: Serial controller
             product: Sunrise Point-H KT Redirection
             vendor: Intel Corporation
             physical id: 16.3
             bus info: pci@0000:00:16.3
             version: 31
             width: 32 bits
             clock: 66MHz
             capabilities: msi pm 16550 bus_master cap_list
             configuration: driver=serial latency=0
             resources: irq:19 ioport:f0a0(size=8) memory:ef239000-ef239fff
        *-storage
             description: SATA controller
             product: Sunrise Point-H SATA controller [AHCI mode]
             vendor: Intel Corporation
             physical id: 17
             bus info: pci@0000:00:17.0
             version: 31
             width: 32 bits
             clock: 66MHz
             capabilities: storage msi pm ahci_1.0 bus_master cap_list
             configuration: driver=ahci latency=0
             resources: irq:125 memory:ef234000-ef235fff memory:ef238000-ef2380ff ioport:f090(size=8) ioport:f080(size=4) ioport:f060(size=32) memory:ef237000-ef2377ff
        *-pci:0
             description: PCI bridge
             product: Sunrise Point-H PCI Express Root Port #5
             vendor: Intel Corporation
             physical id: 1c
             bus info: pci@0000:00:1c.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:122 memory:ef100000-ef1fffff
           *-storage
                description: Non-Volatile memory controller
                product: Toshiba America Info Systems
                vendor: Toshiba America Info Systems
                physical id: 0
                bus info: pci@0000:01:00.0
                version: 00
                width: 64 bits
                clock: 33MHz
                capabilities: storage pciexpress pm msi msix nvm_express bus_master cap_list
                configuration: driver=nvme latency=0
                resources: irq:16 memory:ef100000-ef103fff
        *-pci:1
             description: PCI bridge
             product: Sunrise Point-H PCI Express Root Port #9
             vendor: Intel Corporation
             physical id: 1d
             bus info: pci@0000:00:1d.0
             version: f1
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
             configuration: driver=pcieport
             resources: irq:123 memory:ef000000-ef0fffff
           *-storage
                description: Non-Volatile memory controller
                product: Toshiba America Info Systems
                vendor: Toshiba America Info Systems
                physical id: 0
                bus info: pci@0000:02:00.0
                version: 00
                width: 64 bits
                clock: 33MHz
                capabilities: storage pciexpress pm msi msix nvm_express bus_master cap_list
                configuration: driver=nvme latency=0
                resources: irq:16 memory:ef000000-ef003fff
        *-isa
             description: ISA bridge
             product: Sunrise Point-H LPC Controller
             vendor: Intel Corporation
             physical id: 1f
             bus info: pci@0000:00:1f.0
             version: 31
             width: 32 bits
             clock: 33MHz
             capabilities: isa bus_master
             configuration: latency=0
        *-memory UNCLAIMED
             description: Memory controller
             product: Sunrise Point-H PMC
             vendor: Intel Corporation
             physical id: 1f.2
             bus info: pci@0000:00:1f.2
             version: 31
             width: 32 bits
             clock: 33MHz (30.3ns)
             capabilities: bus_master
             configuration: latency=0
             resources: memory:ef230000-ef233fff
        *-serial UNCLAIMED
             description: SMBus
             product: Sunrise Point-H SMBus
             vendor: Intel Corporation
             physical id: 1f.4
             bus info: pci@0000:00:1f.4
             version: 31
             width: 64 bits
             clock: 33MHz
             configuration: latency=0
             resources: memory:ef236000-ef2360ff ioport:f040(size=32)
        *-network
             description: Ethernet interface
             product: Ethernet Connection (2) I219-LM
             vendor: Intel Corporation
             physical id: 1f.6
             bus info: pci@0000:00:1f.6
             logical name: enp0s31f6
             version: 31
             serial: 4c:52:62:0e:05:43
             size: 1Gbit/s
             capacity: 1Gbit/s
             width: 32 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
             configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=3.2.6-k duplex=full firmware=0.8-4 ip=95.216.75.174 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
             resources: irq:128 memory:ef200000-ef21ffff
  *-power UNCLAIMED
       description: To Be Filled By O.E.M.
       product: To Be Filled By O.E.M.
       vendor: To Be Filled By O.E.M.
       physical id: 1
       version: To Be Filled By O.E.M.
       serial: To Be Filled By O.E.M.
       capacity: 32768mWh

Output of arc_summary when client=96 benchmark was running (zfs 0.7.5)

------------------------------------------------------------------------                                                                                   
ZFS Subsystem Report                            Thu Jan 31 05:28:19 2019
ARC Summary: (HEALTHY) 
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                25.72m
        Mutex Misses:                           353
        Evict Skips:                            6.39m

ARC Size:                               37.89%  11.88   GiB
        Target Size: (Adaptive)         38.07%  11.94   GiB
        Min Size (Hard Limit):          6.25%   1.96    GiB
        Max Size (High Water):          16:1    31.35   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       83.18%  9.24    GiB
        Frequently Used Cache Size:     16.82%  1.87    GiB

ARC Hash Breakdown:
        Elements Max:                           4.47m
        Elements Current:               49.81%  2.23m
        Collisions:                             30.70m
        Chain Max:                              7
        Chains:                                 243.87k

ARC Total accesses:                                     517.32m
        Cache Hit Ratio:                98.13%  507.63m
        Cache Miss Ratio:               1.87%   9.69m
        Actual Hit Ratio:               92.68%  479.47m

        Data Demand Efficiency:         90.87%  102.66m
        Data Prefetch Efficiency:       99.99%  28.20m

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             5.52%   28.01m
          Most Recently Used:           34.22%  173.72m
          Most Frequently Used:         60.23%  305.75m
          Most Recently Used Ghost:     0.02%   113.98k
          Most Frequently Used Ghost:   0.01%   28.77k

        CACHE HITS BY DATA TYPE:
          Demand Data:                  18.38%  93.28m
          Prefetch Data:                5.55%   28.19m
          Demand Metadata:              76.04%  386.01m
          Prefetch Metadata:            0.03%   147.84k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  96.73%  9.38m
          Prefetch Data:                0.02%   1.75k
          Demand Metadata:              2.88%   279.10k
          Prefetch Metadata:            0.38%   36.57k

        
DMU Prefetch Efficiency:                                        427.75m
        Hit Ratio:                      20.40%  87.25m
        Miss Ratio:                     79.60%  340.50m

        
        
ZFS Tunable:
        dbuf_cache_hiwater_pct                            10
        dbuf_cache_lowater_pct                            10
        dbuf_cache_max_bytes                              104857600
        dbuf_cache_max_shift                              5
        dmu_object_alloc_chunk_shift                      7
        ignore_hole_birth                                 1
        l2arc_feed_again                                  1
        l2arc_feed_min_ms                                 200
        l2arc_feed_secs                                   1
        l2arc_headroom                                    2
        l2arc_headroom_boost                              200
        l2arc_noprefetch                                  1
        l2arc_norw                                        0
        l2arc_write_boost                                 8388608
        l2arc_write_max                                   8388608
        metaslab_aliquot                                  524288
        metaslab_bias_enabled                             1
        metaslab_debug_load                               0
        metaslab_debug_unload                             0
        metaslab_fragmentation_factor_enabled             1
        metaslab_lba_weighting_enabled                    1
        metaslab_preload_enabled                          1
        metaslabs_per_vdev                                200
        send_holes_without_birth_time                     1
        spa_asize_inflation                               24
        spa_config_path                                   /etc/zfs/zpool.cache
        spa_load_verify_data                              1
        spa_load_verify_maxinflight                       10000
        spa_load_verify_metadata                          1
        spa_slop_shift                                    5
        zfetch_array_rd_sz                                1048576
        zfetch_max_distance                               8388608
        zfetch_max_streams                                8
        zfetch_min_sec_reap                               2
        zfs_abd_scatter_enabled                           1
        zfs_abd_scatter_max_order                         10
        zfs_admin_snapshot                                1
        zfs_arc_average_blocksize                         8192
        zfs_arc_dnode_limit                               0
        zfs_arc_dnode_limit_percent                       10
        zfs_arc_dnode_reduce_percent                      10
        zfs_arc_grow_retry                                0
        zfs_arc_lotsfree_percent                          10
        zfs_arc_max                                       0
        zfs_arc_meta_adjust_restarts                      4096
        zfs_arc_meta_limit                                0
        zfs_arc_meta_limit_percent                        75
        zfs_arc_meta_min                                  0
        zfs_arc_meta_prune                                10000
        zfs_arc_meta_strategy                             1
        zfs_arc_min                                       0
        zfs_arc_min_prefetch_lifespan                     0
        zfs_arc_p_aggressive_disable                      1
        zfs_arc_p_dampener_disable                        1
        zfs_arc_p_min_shift                               0
        zfs_arc_pc_percent                                0
        zfs_arc_shrink_shift                              0
        zfs_arc_sys_free                                  0
        zfs_autoimport_disable                            1
        zfs_compressed_arc_enabled                        1
        zfs_dbgmsg_enable                                 0
        zfs_dbgmsg_maxsize                                4194304
        zfs_dbuf_state_index                              0
        zfs_deadman_checktime_ms                          5000
        zfs_deadman_enabled                               1
        zfs_deadman_synctime_ms                           1000000
        zfs_dedup_prefetch                                0
        zfs_delay_min_dirty_percent                       60
        zfs_delay_scale                                   500000
        zfs_delete_blocks                                 20480
        zfs_dirty_data_max                                4294967296
        zfs_dirty_data_max_max                            4294967296
        zfs_dirty_data_max_max_percent                    25
        zfs_dirty_data_max_percent                        10
        zfs_dirty_data_sync                               67108864
        zfs_dmu_offset_next_sync                          0
        zfs_expire_snapshot                               300
        zfs_flags                                         0
        zfs_free_bpobj_enabled                            1
        zfs_free_leak_on_eio                              0
        zfs_free_max_blocks                               100000
        zfs_free_min_time_ms                              1000
        zfs_immediate_write_sz                            32768
        zfs_max_recordsize                                1048576
        zfs_mdcomp_disable                                0
        zfs_metaslab_fragmentation_threshold              70
        zfs_metaslab_segment_weight_enabled               1
        zfs_metaslab_switch_threshold                     2
        zfs_mg_fragmentation_threshold                    85
        zfs_mg_noalloc_threshold                          0
        zfs_multihost_fail_intervals                      5
        zfs_multihost_history                             0
        zfs_multihost_import_intervals                    10
        zfs_multihost_interval                            1000
        zfs_multilist_num_sublists                        0
        zfs_no_scrub_io                                   0
        zfs_no_scrub_prefetch                             0
        zfs_nocacheflush                                  0
        zfs_nopwrite_enabled                              1
        zfs_object_mutex_size                             64
        zfs_pd_bytes_max                                  52428800
        zfs_per_txg_dirty_frees_percent                   30
        zfs_prefetch_disable                              0
        zfs_read_chunk_size                               1048576
        zfs_read_history                                  0
        zfs_read_history_hits                             0
        zfs_recover                                       0
        zfs_resilver_delay                                2
        zfs_resilver_min_time_ms                          3000
        zfs_scan_idle                                     50
        zfs_scan_min_time_ms                              1000
        zfs_scrub_delay                                   4
        zfs_send_corrupt_data                             0
        zfs_sync_pass_deferred_free                       2
        zfs_sync_pass_dont_compress                       5
        zfs_sync_pass_rewrite                             2
        zfs_sync_taskq_batch_pct                          75
        zfs_top_maxinflight                               32
        zfs_txg_history                                   0
        zfs_txg_timeout                                   5
        zfs_vdev_aggregation_limit                        131072
        zfs_vdev_async_read_max_active                    3
        zfs_vdev_async_read_min_active                    1
        zfs_vdev_async_write_active_max_dirty_percent     60
        zfs_vdev_async_write_active_min_dirty_percent     30
        zfs_vdev_async_write_max_active                   10
        zfs_vdev_async_write_min_active                   2
        zfs_vdev_cache_bshift                             16
        zfs_vdev_cache_max                                16384
        zfs_vdev_cache_size                               0
        zfs_vdev_max_active                               1000
        zfs_vdev_mirror_non_rotating_inc                  0
        zfs_vdev_mirror_non_rotating_seek_inc             1
        zfs_vdev_mirror_rotating_inc                      0
        zfs_vdev_mirror_rotating_seek_inc                 5
        zfs_vdev_mirror_rotating_seek_offset              1048576
        zfs_vdev_queue_depth_pct                          1000
        zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3 avx2
        zfs_vdev_read_gap_limit                           32768
        zfs_vdev_scheduler                                noop
        zfs_vdev_scrub_max_active                         2
        zfs_vdev_scrub_min_active                         1
        zfs_vdev_sync_read_max_active                     10
        zfs_vdev_sync_read_min_active                     10
        zfs_vdev_sync_write_max_active                    10
        zfs_vdev_sync_write_min_active                    10
        zfs_vdev_write_gap_limit                          4096
        zfs_zevent_cols                                   80
        zfs_zevent_console                                0
        zfs_zevent_len_max                                128
        zfs_zil_clean_taskq_maxalloc                      1048576
        zfs_zil_clean_taskq_minalloc                      1024
        zfs_zil_clean_taskq_nthr_pct                      100
        zil_replay_disable                                0
        zil_slog_bulk                                     786432
        zio_delay_max                                     30000
        zio_dva_throttle_enabled                          1
        zio_requeue_io_start_cut_in_line                  1
        zio_taskq_batch_pct                               75
        zvol_inhibit_dev                                  0
        zvol_major                                        230
        zvol_max_discard_blocks                           16384
        zvol_prefetch_bytes                               131072
        zvol_request_sync                                 0
        zvol_threads                                      32
        zvol_volmode                                      1

Output of iotop when client=96 benchmark was running with warm cache

Total DISK READ :      62.07 M/s | Total DISK WRITE :      85.52 M/s
Actual DISK READ:      15.53 M/s | Actual DISK WRITE:     182.64 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
14228 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [txg_sync]
 6364 be/4 postgres  550.97 K/s    3.48 M/s  0.00 % 20.16 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44864) COMMIT
 6371 be/4 postgres  504.08 K/s    2.48 M/s  0.00 % 18.59 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44880) COMMIT
 6320 be/4 postgres  523.62 K/s    2.89 M/s  0.00 % 17.96 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44776) COMMIT
 6293 be/4 postgres  597.86 K/s 1938.16 K/s  0.00 % 16.63 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44720) COMMIT
 6372 be/4 postgres  519.71 K/s 1789.67 K/s  0.00 % 16.18 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44882) COMMIT
 6329 be/4 postgres  543.15 K/s  922.19 K/s  0.00 % 15.45 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44794) COMMIT
 6346 be/4 postgres  527.52 K/s    2.60 M/s  0.00 % 15.32 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44828) COMMIT
 6314 be/4 postgres  523.62 K/s 2008.49 K/s  0.00 % 15.27 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44764) COMMIT
 6381 be/4 postgres  500.17 K/s 1141.01 K/s  0.00 % 15.01 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44896) COMMIT
 6328 be/4 postgres  574.41 K/s 1156.64 K/s  0.00 % 14.19 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44792) COMMIT
 6289 be/4 postgres  523.62 K/s 1070.68 K/s  0.00 % 13.99 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44712) COMMIT
 6365 be/4 postgres  535.34 K/s  422.02 K/s  0.00 % 13.89 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44866) COMMIT
 6337 be/4 postgres  523.62 K/s 1211.35 K/s  0.00 % 13.77 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44810) COMMIT
 6351 be/4 postgres  578.32 K/s  437.65 K/s  0.00 % 13.75 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44838) COMMIT
 6378 be/4 postgres  578.32 K/s 1242.61 K/s  0.00 % 13.61 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44894) COMMIT
 6401 be/4 postgres  515.80 K/s 1000.34 K/s  0.00 % 13.60 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44898) COMMIT
 6368 be/4 postgres  535.34 K/s 1406.73 K/s  0.00 % 13.49 % postgres: 11/zfs: benchmarking benchmarking 127.0.0.1(44870) COMMIT
 [...]

Output of arc_summary.py when client=24 benchmark was runnnig (zfs 0.7.12)

ZFS Subsystem Report                            Thu Jan 31 15:37:41 2019
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                18.02M
        Mutex Misses:                           282
        Evict Skips:                            4.62M

ARC Size:                               7.10%   2.22    GiB
        Target Size: (Adaptive)         7.82%   2.45    GiB
        Min Size (Hard Limit):          6.25%   1.96    GiB
        Max Size (High Water):          16:1    31.35   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       92.56%  1.92    GiB
        Frequently Used Cache Size:     7.44%   158.22  MiB

ARC Hash Breakdown:
        Elements Max:                           4.47M
        Elements Current:               10.30%  460.47k
        Collisions:                             20.45M
        Chain Max:                              6
        Chains:                                 5.64k

ARC Total accesses:                                     381.62M
        Cache Hit Ratio:                98.58%  376.22M
        Cache Miss Ratio:               1.42%   5.41M
        Actual Hit Ratio:               93.35%  356.24M

        Data Demand Efficiency:         92.60%  70.05M
        Data Prefetch Efficiency:       99.99%  19.98M

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             5.29%   19.90M
          Most Recently Used:           36.76%  138.30M
          Most Frequently Used:         57.93%  217.94M
          Most Recently Used Ghost:     0.02%   62.09k
          Most Frequently Used Ghost:   0.01%   21.54k
    
        CACHE HITS BY DATA TYPE:   
          Demand Data:                  17.24%  64.87M
          Prefetch Data:                5.31%   19.98M
          Demand Metadata:              77.42%  291.27M
          Prefetch Metadata:            0.03%   96.88k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  95.85%  5.18M
          Prefetch Data:                0.02%   1.11k
          Demand Metadata:              3.76%   203.31k
          Prefetch Metadata:            0.37%   19.99k

        
DMU Prefetch Efficiency:                                        296.26M
        Hit Ratio:                      20.19%  59.82M
        Miss Ratio:                     79.81%  236.44M

        
        
ZFS Tunables:
        dbuf_cache_hiwater_pct                            10
        dbuf_cache_lowater_pct                            10
        dbuf_cache_max_bytes                              104857600
        dbuf_cache_max_shift                              5
        dmu_object_alloc_chunk_shift                      7
        ignore_hole_birth                                 1
        l2arc_feed_again                                  1
        l2arc_feed_min_ms                                 200
        l2arc_feed_secs                                   1
        l2arc_headroom                                    2
        l2arc_headroom_boost                              200
        l2arc_noprefetch                                  1
        l2arc_norw                                        0
        l2arc_write_boost                                 8388608
        l2arc_write_max                                   8388608
        metaslab_aliquot                                  524288
        metaslab_bias_enabled                             1
        metaslab_debug_load                               0
        metaslab_debug_unload                             0
        metaslab_fragmentation_factor_enabled             1
        metaslab_lba_weighting_enabled                    1
        metaslab_preload_enabled                          1
        metaslabs_per_vdev                                200
        send_holes_without_birth_time                     1
        spa_asize_inflation                               24
        spa_config_path                                   /etc/zfs/zpool.cache
        spa_load_verify_data                              1
        spa_load_verify_maxinflight                       10000
        spa_load_verify_metadata                          1
        spa_slop_shift                                    5
        zfetch_array_rd_sz                                1048576
        zfetch_max_distance                               8388608
        zfetch_max_streams                                8
        zfetch_min_sec_reap                               2
        zfs_abd_scatter_enabled                           1
        zfs_abd_scatter_max_order                         10
        zfs_admin_snapshot                                1
        zfs_arc_average_blocksize                         8192
        zfs_arc_dnode_limit                               0
        zfs_arc_dnode_limit_percent                       10
        zfs_arc_dnode_reduce_percent                      10
        zfs_arc_grow_retry                                0
        zfs_arc_lotsfree_percent                          10
        zfs_arc_max                                       0
        zfs_arc_meta_adjust_restarts                      4096
        zfs_arc_meta_limit                                0
        zfs_arc_meta_limit_percent                        75
        zfs_arc_meta_min                                  0
        zfs_arc_meta_prune                                10000
        zfs_arc_meta_strategy                             1
        zfs_arc_min                                       0
        zfs_arc_min_prefetch_lifespan                     0
        zfs_arc_p_dampener_disable                        1
        zfs_arc_p_min_shift                               0
        zfs_arc_pc_percent                                0
        zfs_arc_shrink_shift                              0
        zfs_arc_sys_free                                  0
        zfs_autoimport_disable                            1
        zfs_checksums_per_second                          20
        zfs_compressed_arc_enabled                        1
        zfs_dbgmsg_enable                                 0
        zfs_dbgmsg_maxsize                                4194304
        zfs_dbuf_state_index                              0
        zfs_deadman_checktime_ms                          5000
        zfs_deadman_enabled                               1
        zfs_deadman_synctime_ms                           1000000
        zfs_dedup_prefetch                                0
        zfs_delay_min_dirty_percent                       60
        zfs_delay_scale                                   500000
        zfs_delays_per_second                             20
        zfs_delete_blocks                                 20480
        zfs_dirty_data_max                                4294967296
        zfs_dirty_data_max_max                            4294967296
        zfs_dirty_data_max_max_percent                    25
        zfs_dirty_data_max_percent                        10
        zfs_dirty_data_sync                               67108864
        zfs_dmu_offset_next_sync                          0
        zfs_expire_snapshot                               300
        zfs_flags                                         0
        zfs_free_bpobj_enabled                            1
        zfs_free_leak_on_eio                              0
        zfs_free_max_blocks                               100000
        zfs_free_min_time_ms                              1000
        zfs_immediate_write_sz                            32768
        zfs_max_recordsize                                1048576
        zfs_mdcomp_disable                                0
        zfs_metaslab_fragmentation_threshold              70
        zfs_metaslab_segment_weight_enabled               1
        zfs_metaslab_switch_threshold                     2
        zfs_mg_fragmentation_threshold                    85
        zfs_mg_noalloc_threshold                          0
        zfs_multihost_fail_intervals                      5
        zfs_multihost_history                             0
        zfs_multihost_import_intervals                    10
        zfs_multihost_interval                            1000
        zfs_multilist_num_sublists                        0
        zfs_no_scrub_io                                   0
        zfs_no_scrub_prefetch                             0
        zfs_nocacheflush                                  0
        zfs_nopwrite_enabled                              1
        zfs_object_mutex_size                             64
        zfs_pd_bytes_max                                  52428800
        zfs_per_txg_dirty_frees_percent                   30
        zfs_prefetch_disable                              0
        zfs_read_chunk_size                               1048576
        zfs_read_history                                  0
        zfs_read_history_hits                             0
        zfs_recover                                       0
        zfs_recv_queue_length                             16777216
        zfs_resilver_delay                                2
        zfs_resilver_min_time_ms                          3000
        zfs_scan_idle                                     50
        zfs_scan_ignore_errors                            0
        zfs_scan_min_time_ms                              1000
        zfs_scrub_delay                                   4
        zfs_send_corrupt_data                             0
        zfs_send_queue_length                             16777216
        zfs_sync_pass_deferred_free                       2
        zfs_sync_pass_dont_compress                       5
        zfs_sync_pass_rewrite                             2
        zfs_sync_taskq_batch_pct                          75
        zfs_top_maxinflight                               32
        zfs_txg_history                                   0
        zfs_txg_timeout                                   5
        zfs_vdev_aggregation_limit                        131072
        zfs_vdev_async_read_max_active                    3
        zfs_vdev_async_read_min_active                    1
        zfs_vdev_async_write_active_max_dirty_percent     60
        zfs_vdev_async_write_active_min_dirty_percent     30
        zfs_vdev_async_write_max_active                   10
        zfs_vdev_async_write_min_active                   2
        zfs_vdev_cache_bshift                             16
        zfs_vdev_cache_max                                16384
        zfs_vdev_cache_size                               0
        zfs_vdev_max_active                               1000
        zfs_vdev_mirror_non_rotating_inc                  0
        zfs_vdev_mirror_non_rotating_seek_inc             1
        zfs_vdev_mirror_rotating_inc                      0
        zfs_vdev_mirror_rotating_seek_inc                 5
        zfs_vdev_mirror_rotating_seek_offset              1048576
        zfs_vdev_queue_depth_pct                          1000
        zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3 avx2
        zfs_vdev_read_gap_limit                           32768
        zfs_vdev_scheduler                                noop
        zfs_vdev_scrub_max_active                         2
        zfs_vdev_scrub_min_active                         1
        zfs_vdev_sync_read_max_active                     10
        zfs_vdev_sync_read_min_active                     10
        zfs_vdev_sync_write_max_active                    10
        zfs_vdev_sync_write_min_active                    10
        zfs_vdev_write_gap_limit                          4096
        zfs_zevent_cols                                   80
        zfs_zevent_console                                0
        zfs_zevent_len_max                                128
        zil_replay_disable                                0
        zil_slog_bulk                                     786432
        zio_delay_max                                     30000
        zio_dva_throttle_enabled                          1
        zio_requeue_io_start_cut_in_line                  1
        zio_taskq_batch_pct                               75
        zvol_inhibit_dev                                  0
        zvol_major                                        230
        zvol_max_discard_blocks                           16384
        zvol_prefetch_bytes                               131072
        zvol_request_sync                                 0
        zvol_threads                                      32
        zvol_volmode                                      1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment