Skip to content

Instantly share code, notes, and snippets.

@tomislacker
Last active May 4, 2018 21:10
Show Gist options
  • Save tomislacker/85439d80b5a6bdf4f64f7dbe7363ad0b to your computer and use it in GitHub Desktop.
Save tomislacker/85439d80b5a6bdf4f64f7dbe7363ad0b to your computer and use it in GitHub Desktop.
Benchmarking ElastiCache (Redis) Bandwidth Performance
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ElastiCache Bandwith Mapping\n",
"## About\n",
"There have been occasions where we've discovered that our\n",
"[ElastiCache](https://aws.amazon.com/elasticache/) Redis instances\n",
"become a bottleneck due to their respective instance type's bandwidth\n",
"capabilities. As a response, we determined that we needed our own\n",
"benchmarking capabilities that were flexible enough to adapt to analyzing\n",
"not just bandwidth but also CPU load. With this data, we would then have\n",
"reasonable thresholds to which alarms could be created for situations where\n",
"we're at or near the point of resource exhaustion.\n",
"\n",
"### Objectives\n",
"* Construct a framework for benchmarking ElastiCache Redis performance\n",
"* Tune as needed to construct reasonably consistent results\n",
"* Output the data in a way that a\n",
"[CloudFormation Mapping](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/mappings-section-structure.html)\n",
"may be constructed & published\n",
"* Adapt templates that use ElastiCache Redis to import the published mapping,\n",
"match the intended instance type to a value, and set a reasonable alarm threshold\n",
"\n",
"## Results\n",
"### Benchmarking Framework\n",
"_See [widdix/ec2-network-benchmark#2](https://github.com/widdix/ec2-network-benchmark/pull/2)_\n",
"\n",
"### Data\n",
"Once we were able to benchmark our instances, we queried\n",
"[Amazon Athena](https://aws.amazon.com/athena/)\n",
"with the following query to download a CSV from.\n",
"\n",
"```sql\n",
"SELECT\n",
" instancetype,\n",
" dataSize,\n",
" (avg(networkbytesout.p90)/60/1024/1024*8) AS mbps_p90,\n",
" (avg(networkbytesout.p70)/60/1024/1024*8) AS mbps_p70,\n",
" count(distinct benchmarkId) as test_passes,\n",
" avg(cpuutilization.p90) AS cpuutilization_90,\n",
" avg(enginecpuutilization.p90) AS enginecpuutilization_90,\n",
" avg(cpuutilization.p50) AS cpuutilization_50,\n",
" avg(enginecpuutilization.p50) AS enginecpuutilization_50,\n",
" avg(networkbytesout.p90) AS BytePerMinP90\n",
"FROM cachenetworkbenchmark\n",
"WHERE d >= from_iso8601_date('2018-05-01')\n",
"GROUP BY region, instancetype, dataSize\n",
"ORDER BY mbps_p90 DESC, region, instancetype, dataSize\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import pandas as pd\n",
"\n",
"CSV_PATH = '64bbbb18-12de-4e93-9c15-e5407c499f74.csv'"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"results = pd.read_csv(CSV_PATH)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>instancetype</th>\n",
" <th>dataSize</th>\n",
" <th>mbps_p90</th>\n",
" <th>mbps_p70</th>\n",
" <th>test_passes</th>\n",
" <th>cpuutilization_90</th>\n",
" <th>enginecpuutilization_90</th>\n",
" <th>cpuutilization_50</th>\n",
" <th>enginecpuutilization_50</th>\n",
" <th>BytePerMinP90</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>cache.r4.16xlarge</td>\n",
" <td>100000</td>\n",
" <td>10570.404</td>\n",
" <td>10180.308</td>\n",
" <td>1</td>\n",
" <td>3.293243</td>\n",
" <td>62.481094</td>\n",
" <td>3.156840</td>\n",
" <td>60.163340</td>\n",
" <td>8.312904e+10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>cache.r4.4xlarge</td>\n",
" <td>100000</td>\n",
" <td>9178.739</td>\n",
" <td>9078.103</td>\n",
" <td>5</td>\n",
" <td>8.583490</td>\n",
" <td>48.732200</td>\n",
" <td>6.758875</td>\n",
" <td>42.735767</td>\n",
" <td>7.218454e+10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>cache.r4.8xlarge</td>\n",
" <td>100000</td>\n",
" <td>9171.347</td>\n",
" <td>9033.280</td>\n",
" <td>5</td>\n",
" <td>5.165210</td>\n",
" <td>49.556260</td>\n",
" <td>4.878917</td>\n",
" <td>43.237050</td>\n",
" <td>7.212641e+10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>cache.m4.10xlarge</td>\n",
" <td>100000</td>\n",
" <td>8522.664</td>\n",
" <td>8439.150</td>\n",
" <td>5</td>\n",
" <td>2.197376</td>\n",
" <td>35.648620</td>\n",
" <td>2.033342</td>\n",
" <td>29.952028</td>\n",
" <td>6.702496e+10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" instancetype dataSize mbps_p90 mbps_p70 test_passes \\\n",
"0 cache.r4.16xlarge 100000 10570.404 10180.308 1 \n",
"1 cache.r4.4xlarge 100000 9178.739 9078.103 5 \n",
"2 cache.r4.8xlarge 100000 9171.347 9033.280 5 \n",
"3 cache.m4.10xlarge 100000 8522.664 8439.150 5 \n",
"\n",
" cpuutilization_90 enginecpuutilization_90 cpuutilization_50 \\\n",
"0 3.293243 62.481094 3.156840 \n",
"1 8.583490 48.732200 6.758875 \n",
"2 5.165210 49.556260 4.878917 \n",
"3 2.197376 35.648620 2.033342 \n",
"\n",
" enginecpuutilization_50 BytePerMinP90 \n",
"0 60.163340 8.312904e+10 \n",
"1 42.735767 7.218454e+10 \n",
"2 43.237050 7.212641e+10 \n",
"3 29.952028 6.702496e+10 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"results.head(4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### \"Data size\" Evaluations\n",
"The `redis-benchmark` application provides an argument (`-d`) to specify\n",
"the size of the data being used during the execution. Early on, we discovered\n",
"that tuning this value was the single most influential value for squeezing\n",
"more bandwidth out of an instance.\n",
"\n",
"Not only would that typically result in more bandwidth, it often came with\n",
"a reduction in CPU load as well -- but to decreasing effect.\n",
"\n",
"Below shows one such example of these observations."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>instancetype</th>\n",
" <th>dataSize</th>\n",
" <th>mbps_p90</th>\n",
" <th>mbps_p70</th>\n",
" <th>test_passes</th>\n",
" <th>cpuutilization_90</th>\n",
" <th>enginecpuutilization_90</th>\n",
" <th>cpuutilization_50</th>\n",
" <th>enginecpuutilization_50</th>\n",
" <th>BytePerMinP90</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>cache.m4.xlarge</td>\n",
" <td>100000</td>\n",
" <td>734.86810</td>\n",
" <td>713.35380</td>\n",
" <td>5</td>\n",
" <td>16.646667</td>\n",
" <td>28.711346</td>\n",
" <td>12.355090</td>\n",
" <td>4.539111</td>\n",
" <td>5.779238e+09</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>cache.m4.xlarge</td>\n",
" <td>10000</td>\n",
" <td>714.60900</td>\n",
" <td>701.69025</td>\n",
" <td>1</td>\n",
" <td>16.310000</td>\n",
" <td>18.576769</td>\n",
" <td>16.310000</td>\n",
" <td>17.449402</td>\n",
" <td>5.619914e+09</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>cache.m4.xlarge</td>\n",
" <td>1024</td>\n",
" <td>626.63776</td>\n",
" <td>622.25073</td>\n",
" <td>3</td>\n",
" <td>25.761267</td>\n",
" <td>56.875660</td>\n",
" <td>21.817337</td>\n",
" <td>46.935776</td>\n",
" <td>4.928080e+09</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" instancetype dataSize mbps_p90 mbps_p70 test_passes \\\n",
"30 cache.m4.xlarge 100000 734.86810 713.35380 5 \n",
"31 cache.m4.xlarge 10000 714.60900 701.69025 1 \n",
"33 cache.m4.xlarge 1024 626.63776 622.25073 3 \n",
"\n",
" cpuutilization_90 enginecpuutilization_90 cpuutilization_50 \\\n",
"30 16.646667 28.711346 12.355090 \n",
"31 16.310000 18.576769 16.310000 \n",
"33 25.761267 56.875660 21.817337 \n",
"\n",
" enginecpuutilization_50 BytePerMinP90 \n",
"30 4.539111 5.779238e+09 \n",
"31 17.449402 5.619914e+09 \n",
"33 46.935776 4.928080e+09 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"results[results[\"instancetype\"] == 'cache.m4.xlarge'].groupby('dataSize').head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>instancetype</th>\n",
" <th>dataSize</th>\n",
" <th>mbps_p90</th>\n",
" <th>cpuutilization_90</th>\n",
" <th>enginecpuutilization_90</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>cache.m4.xlarge</td>\n",
" <td>100000</td>\n",
" <td>734.86810</td>\n",
" <td>16.646667</td>\n",
" <td>28.711346</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>cache.m4.xlarge</td>\n",
" <td>10000</td>\n",
" <td>714.60900</td>\n",
" <td>16.310000</td>\n",
" <td>18.576769</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>cache.m4.xlarge</td>\n",
" <td>1024</td>\n",
" <td>626.63776</td>\n",
" <td>25.761267</td>\n",
" <td>56.875660</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" instancetype dataSize mbps_p90 cpuutilization_90 \\\n",
"30 cache.m4.xlarge 100000 734.86810 16.646667 \n",
"31 cache.m4.xlarge 10000 714.60900 16.310000 \n",
"33 cache.m4.xlarge 1024 626.63776 25.761267 \n",
"\n",
" enginecpuutilization_90 \n",
"30 28.711346 \n",
"31 18.576769 \n",
"33 56.875660 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"results[results[\"instancetype\"] == 'cache.m4.xlarge'] \\\n",
"[['instancetype','dataSize', 'mbps_p90', 'cpuutilization_90', 'enginecpuutilization_90']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Creating the Mapping\n",
"Once we acquired the level of data that was desired, we were ready to\n",
"construct our mapping transform and two main considerations have been\n",
"made here below:\n",
"\n",
"1. Network traffic is typically measured in _bits_ but ElastiCache\n",
"reports _bytes_, so we'll need to do some calculations\n",
"1. The 90th percentile data from CloudWatch is being used rather than\n",
"the 100th percentile in an attempt to get a more stable, consistent,\n",
"and conservative parameter\n",
"\n",
"We'll treat the 90th percentile as the maximum bandwidth we should\n",
"ever **expect** an instance type to be able to communicate at. From\n",
"there, we'll break down percentages of that value for consumption in\n",
"the mapping. In our case, 100%, 90%, and 80%; again all based off the\n",
"observed 90th percentile of bandwidth observed during the tests."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"mapping = {}\n",
"for instance_type in results.instancetype.unique():\n",
" instance_results = results[results[\"instancetype\"] == instance_type]\n",
" max_bandwidth = instance_results.mbps_p90.max() / 8 * 1024 * 1024\n",
" mapping.update({\n",
" instance_type: {\n",
" percent: int(max_bandwidth*percent/100)\n",
" for percent in range(100, 70, -10)\n",
" }\n",
" })"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1454"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(json.dumps(mapping, sort_keys=2))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{\n",
" \"cache.m3.2xlarge\": {\n",
" \"80\": 49835910,\n",
" \"90\": 56065399,\n",
" \"100\": 62294888\n",
" },\n",
" \"cache.m3.large\": {\n",
" \"80\": 48378954,\n",
" \"90\": 54426323,\n",
" \"100\": 60473692\n",
" },\n",
" \"cache.m3.medium\": {\n",
" \"80\": 29315378,\n",
" \"90\": 32979801,\n",
" \"100\": 36644223\n",
" },\n",
" \"cache.m3.xlarge\": {\n",
" \"80\": 54896319,\n",
" \"90\": 61758359,\n",
" \"100\": 68620399\n",
" },\n",
" \"cache.m4.10xlarge\": {\n",
" \"80\": 893666092,\n",
" \"90\": 1005374354,\n",
" \"100\": 1117082615\n",
" },\n",
" \"cache.m4.2xlarge\": {\n",
" \"80\": 102033530,\n",
" \"90\": 114787721,\n",
" \"100\": 127541912\n",
" },\n",
" \"cache.m4.4xlarge\": {\n",
" \"80\": 204153531,\n",
" \"90\": 229672723,\n",
" \"100\": 255191914\n",
" },\n",
" \"cache.m4.large\": {\n",
" \"80\": 47538989,\n",
" \"90\": 53481362,\n",
" \"100\": 59423736\n",
" },\n",
" \"cache.m4.xlarge\": {\n",
" \"80\": 77056505,\n",
" \"90\": 86688568,\n",
" \"100\": 96320631\n",
" },\n",
" \"cache.r3.2xlarge\": {\n",
" \"80\": 98557728,\n",
" \"90\": 110877444,\n",
" \"100\": 123197160\n",
" },\n",
" \"cache.r3.4xlarge\": {\n",
" \"80\": 105002848,\n",
" \"90\": 118128204,\n",
" \"100\": 131253560\n",
" },\n",
" \"cache.r3.8xlarge\": {\n",
" \"80\": 104620621,\n",
" \"90\": 117698199,\n",
" \"100\": 130775777\n",
" },\n",
" \"cache.r3.large\": {\n",
" \"80\": 49536345,\n",
" \"90\": 55728388,\n",
" \"100\": 61920431\n",
" },\n",
" \"cache.r3.xlarge\": {\n",
" \"80\": 68982989,\n",
" \"90\": 77605862,\n",
" \"100\": 86228736\n",
" },\n",
" \"cache.r4.16xlarge\": {\n",
" \"80\": 1108387194,\n",
" \"90\": 1246935593,\n",
" \"100\": 1385483993\n",
" },\n",
" \"cache.r4.2xlarge\": {\n",
" \"80\": 820773115,\n",
" \"90\": 923369755,\n",
" \"100\": 1025966394\n",
" },\n",
" \"cache.r4.4xlarge\": {\n",
" \"80\": 962460542,\n",
" \"90\": 1082768110,\n",
" \"100\": 1203075678\n",
" },\n",
" \"cache.r4.8xlarge\": {\n",
" \"80\": 961685435,\n",
" \"90\": 1081896114,\n",
" \"100\": 1202106793\n",
" },\n",
" \"cache.r4.large\": {\n",
" \"80\": 537689598,\n",
" \"90\": 604900797,\n",
" \"100\": 672111997\n",
" },\n",
" \"cache.r4.xlarge\": {\n",
" \"80\": 654315513,\n",
" \"90\": 736104952,\n",
" \"100\": 817894391\n",
" }\n",
"}\n"
]
}
],
"source": [
"print(json.dumps(mapping, indent=2, sort_keys=2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment