Benchmark: fio write
Command: fio --name=seqwrite --rw=write --bs=128k --size=4g --end_fsync=1 --loops=4 # aggrb tput
Rationale: Measure the performance of a single threaded streaming write of a reasonably large file. The aim is to measure how well the file system and platform can sustain a write workload, which will depend on how well it can group and dispatch writes. It's difficult to benchmark buffered file system writes in both a short duration and in a repeatable way, as performance greatly depends on if and when the pagecache begins to flush dirty data. As a workaround, an fsync() at the end of the benchmark is called to ensure that flushing will always occur, and the benchmark also repeats four times. While this provides a much more reliable measurement, it is somewhat worst-case (applications don't always fsync), providing closer to a minimum rate – rather than a maximum rate – that you should expect.
Benchmark: fio randread
Command: fio --name=randread --rw=randread --pre_read=1 --norandommap --bs=4k --size=256m --runtime=30 --loops=1000 # calc IOPS
Rationale: Measure the performance of a single threaded cached random read workload. Notes: The --loops option is necessary to keep fio running for --runtime=30, otherwise it aborts when it has performed --size=1g worth of random reads. Ensure instance has at least 100 MB of DRAM it can use for the pagecache; if not, reduce --size to fit.
Benchmark: fio randread
Command: fio --numjobs=#CPUs --name=randread --rw=randread --pre_read=1 --norandommap --bs=4k --size=256m/number_of_CPUs --runtime=30 --loops=1000 # calc IOPS
Rationale: Test the scalability of the file system compared to the single threaded version of this test. Notes: The --loops option is necessary to keep fio running for --runtime=30, otherwise it aborts when it has performed --size=1g worth of random reads.
Benchmark: fio randread
Command: fio --numjobs=#CPUs --name=partial --rw=randread --norandommap --random_distribution=pareto:0.9 --bs=4k --size=2xDRAM/#CPUs --runtime=60 --loops=1000 # calc IOPS
Rationale: Use a working set size that can only be partially cached, to test the performance of file system caching. This also uses a pareto access distribution to resemble real world workloads. This test may reveal additional storage I/O cache available to the instance beyond its own page cache. Notes: This is deliberately tuned to not be a disk benchmark. The --size option is per-job (thread). The intended working set size is 2 x DRAM, so this amount should be divided by the number of CPUs for the --size. You may also need the latest version of fio for the distribution option: https://github.com/axboe/fio
instead of loops=1000 you could use the
time_based
option to achieve same thing i believe