Sometimes choosing the right AWS resource to use makes all the difference. Here's a story of how I used iotop
and iostat
to help build evidence for the need to chose ebs-optimized disk to solve a problem.
A redis box was acting up. Here's what I'd experience:
- slow login
- failing redis backups (ERR Background save already in progress)
- general slugishness
I suspected the disk, and used iotop
to see that it was indeed so. But, I needed more evidence, so recorded the info over time using iostat
. This utility returns the same data as iotop
, but in a tabular format. To make it more readable, we only grep the lines with iowait
in them.
iostat 1 | grep iowait -A 1
and got this in waves.
avg-cpu: %user %nice %system %iowait %steal %idle
2.27 0.00 0.73 17.32 0.01 79.66
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 49.25 0.00 49.75
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.50 49.25 0.00 50.25
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.00 48.76 0.00 49.75
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.50 49.25 0.00 50.25
--
avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 1.00 49.00 0.00 49.00
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.50 49.25 0.00 50.25
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.00 48.76 0.00 49.75
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.50 49.50 0.00 50.00
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 68.50 0.00 30.50
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 98.50 0.50 0.00
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.00 99.00 0.00 0.00
Notice the iowait
column would spike to 99% and stay there for a while. With another window open, I tried typing during those times and got the lack of response. So, I started another ec2 instance, this time with an ebs-optimized disk and higher (500) iops. This solved the problem. Notice the idle
and iowait
columns below stay low.
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.50 0.00 0.00 98.00
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.51 0.00 0.00 99.49
--
avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 1.00 0.00 0.00 98.01
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.51 0.00 0.00 97.99
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.00 0.00 0.00 98.50
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 0.00 0.50 98.50
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.49 0.00 0.00 98.01
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.00 0.00 0.00 98.50
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.51 0.00 0.00 99.49
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 1.00 0.00 0.00 98.50
--
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 0.50 0.00 0.00 99.00
newrelic told us the same story.
before ebs-optimized disks
after ebs-optimized disks
Sometimes the baseline disk that comes with an instance is just fine. Other times you need something more robust. Since Redis saves so regularly, we needed a faster link between the two. But, rather than just starting the new disk and calling it done, we measured our results.