Skip to content

Instantly share code, notes, and snippets.

@matburt
Last active June 29, 2018 15:31
Show Gist options
  • Save matburt/783c59fc4f02df85c216316980347f42 to your computer and use it in GitHub Desktop.
Save matburt/783c59fc4f02df85c216316980347f42 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### An analysis of allowed simultaneous capacity relative to system specs in Ansible Tower (and AWX)\n",
"\n",
"In this document we demonstrate the behavior of our capacity algorithms in Ansible Tower version prior to 3.3 and in early versions of AWX [https://github.com/ansible/awx] (prior to 1.0.3)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import psutil\n",
"plt.rcParams[\"figure.figsize\"] = (12, 9)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2 (AWX 1.0.2.0) and prior Capacity Algorithm\n",
"\n",
"This is our classical cluster capacity algorithm in AWX. Interestingly you'll notice that simultaneous allowed jobs is not a linear stair-step pattern as memory increases. This is a gross inefficiency in the algorithm that can't really be explained in a good way."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def get_old_capacity(mem=4096):\n",
" if mem <= 2048:\n",
" return 50\n",
" return 50 + ((mem / 1024) - 2) * 75"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"mem_range = range(2048, 32768)\n",
"grow_chart = [get_old_capacity(mem=x) for x in mem_range]\n",
"jobs_chart_5 = [max(1, int(x/50)) for x in grow_chart]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0xa59b150>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0xab4be50>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Last value 44\n"
]
}
],
"source": [
"plt.plot(mem_range, grow_chart, 'g')\n",
"plt.xlabel(\"mbytes memory\")\n",
"plt.ylabel(\"shown capacity value\")\n",
"plt.show()\n",
"plt.plot(mem_range, jobs_chart_5, 'r')\n",
"plt.xlabel(\"mbytes memory\")\n",
"plt.ylabel(\"simultaneous jobs\")\n",
"plt.show()\n",
"jobs_count_at_32_from_old = jobs_chart_5[-1]\n",
"print(\"Last value {}\".format(jobs_count_at_32_from_old))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3 (AWX 1.0.3+) Experiment - Aligning more to forks values\n",
"\n",
"We can effectively divide the `get_capacity` algorithm values by 10 to bring them more in-line with real fork values. We switch the constant from `75` to `7` so we aren't dealing with fractional bits.\n",
"\n",
"Due to the overall growth reduction from the constant `7` we see that it alleviates some of the odd growth in number of simultaneous jobs but not all.\n",
"\n",
"What we are trying to get to is a capacity value that is meaningful to the end user and accurately represents how many simultaneous jobs of a particular complexity are capable of being run by a single node.\n",
"\n",
"Our baseline is to evaluate the number of simultaneous 5-fork jobs we can run."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def get_capacity(mem=4096, cpu=2):\n",
" if mem <= 2048:\n",
" return 5.0\n",
" return 5 + ((mem / 1024) - 2) * 7"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"mem_range = range(2048, 32768)\n",
"grow_chart = [get_capacity(mem=x) for x in mem_range] # Capacity per-gigabytes over the memory range\n",
"jobs_chart_5 = [max(1,int((x/6))) for x in grow_chart] # number of simultaneous 5-fork jobs"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x9842b50>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0xa2eb310>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.plot(mem_range, grow_chart, 'g')\n",
"plt.xlabel('mbytes memory')\n",
"plt.ylabel('shown capacity')\n",
"plt.show()\n",
"plt.plot(mem_range, jobs_chart_5)\n",
"plt.xlabel(\"mbytes memory\")\n",
"plt.ylabel(\"simultaneous jobs\")\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyzing ansible-playbook run behavior and memory usage\n",
"\n",
"In looking at typical Ansible runs we can see that forks hover in the 25MB - 40MB memory usage. The parent of the forks also seems to typically have around this much."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### valgrind - massif (parent process)\n",
"\n",
"```\n",
"--------------------------------------------------------------------------------\n",
"Command: /usr/bin/ansible -i localhost,foo,bar,bing all -c local -m shell -a sleep 5\n",
"Massif arguments: (none)\n",
"ms_print arguments: massif.out.662\n",
"--------------------------------------------------------------------------------\n",
"\n",
"\n",
" MB\n",
"27.31^ # \n",
" | @#:::::@:\n",
" | @:@#:::::@:\n",
" | : @:@#:::::@:\n",
" | @@::::@:@#:::::@:\n",
" | @@@ :@@:: :@:@#:::::@:\n",
" | :::::@@ :::::@@:: :@:@#:::::@:\n",
" | ::::::: ::@@ ::: : :@@:: :@:@#:::::@:\n",
" | :@@@: :: :: ::@@ ::: : :@@:: :@:@#:::::@:\n",
" | ::@@:@ @: :: :: ::@@ @@::: : :@@:: :@:@#:::::@:\n",
" | @::@:: @@:@ @: :: :: ::@@ @ ::: : :@@:: :@:@#:::::@:\n",
" | ::::@: @:: @@:@ @: :: :: ::@@ ::@ ::: : :@@:: :@:@#:::::@:\n",
" | @@::::: @: @:: @@:@ @: :: :: ::@@ :::@ ::: : :@@:: :@:@#:::::@:\n",
" | :::::@ ::::: @: @:: @@:@ @: :: :: ::@@ :::@ ::: : :@@:: :@:@#:::::@:\n",
" | :: ::@ ::::: @: @:: @@:@ @: :: :: ::@@ :::@ ::: : :@@:: :@:@#:::::@:\n",
" | @:: ::@ ::::: @: @:: @@:@ @: :: :: ::@@ :::@ ::: : :@@:: :@:@#:::::@:\n",
" | @:: ::@ ::::: @: @:: @@:@ @: :: :: ::@@ :::@ ::: : :@@:: :@:@#:::::@:\n",
" | :@:: ::@ ::::: @: @:: @@:@ @: :: :: ::@@ :::@ ::: : :@@:: :@:@#:::::@:\n",
" | ::@:: ::@ ::::: @: @:: @@:@ @: :: :: ::@@ :::@ ::: : :@@:: :@:@#:::::@:\n",
" | ::@:: ::@ ::::: @: @:: @@:@ @: :: :: ::@@ :::@ ::: : :@@:: :@:@#:::::@:\n",
" 0 +----------------------------------------------------------------------->Gi\n",
" 0 3.480\n",
"\n",
"Number of snapshots: 66\n",
" Detailed snapshots: [3, 8, 14, 16, 19, 20, 22, 23, 31, 32, 36, 42, 43, 47, 49, 50, 51, 52 (peak), 62]\n",
"\n",
"--------------------------------------------------------------------------------\n",
" n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)\n",
"--------------------------------------------------------------------------------\n",
" 0 0 0 0 0 0\n",
" 1 67,644,245 3,475,664 3,161,038 314,626 0\n",
" 2 123,861,379 5,395,040 4,962,463 432,577 0\n",
" 3 193,072,538 7,691,664 6,987,949 703,715 0\n",
"...\n",
"--------------------------------------------------------------------------------\n",
" n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)\n",
"--------------------------------------------------------------------------------\n",
" 61 3,609,926,526 28,600,328 26,141,617 2,458,711 0\n",
" 62 3,639,714,345 28,602,792 26,143,633 2,459,159 0\n",
" 63 3,669,502,245 28,605,256 26,145,649 2,459,607 0\n",
" 64 3,699,301,974 28,603,888 26,144,577 2,459,311 0\n",
" 65 3,736,563,442 28,600,544 26,143,774 2,456,770 0\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### valgrind - massif (fork-1)\n",
"\n",
"```\n",
"--------------------------------------------------------------------------------\n",
"Command: /usr/bin/ansible -i localhost,foo,bar,bing all -c local -m shell -a sleep 5\n",
"Massif arguments: (none)\n",
"ms_print arguments: massif.out.670\n",
"--------------------------------------------------------------------------------\n",
"\n",
"\n",
" MB\n",
"34.52^ # \n",
" | @ #:\n",
" | @ ::::@#:\n",
" | @:::::@#:\n",
" | @@:::::@#:\n",
" | @@@:::::@#:\n",
" | @:@@@:::::@#:\n",
" | : :@:@@@:::::@#:\n",
" | @@ @@::::@:@@@:::::@#:\n",
" | ::::::@@ ::::@@:: :@:@@@:::::@#:\n",
" | @::::: ::: @@ ::::: :@@:: :@:@@@:::::@#:\n",
" | @@:@@@: ::: ::: @@ @:: :: :@@:: :@:@@@:::::@#:\n",
" | :@:::@@:@ @: ::: ::: @@ @:: :: :@@:: :@:@@@:::::@#:\n",
" | :::::@@:@:: @@:@ @: ::: ::: @@ :::@:: :: :@@:: :@:@@@:::::@#:\n",
" | ::@@:: :::@ :@:: @@:@ @: ::: ::: @@::: @:: :: :@@:: :@:@@@:::::@#:\n",
" | ::::@ :: :::@ :@:: @@:@ @: ::: ::: @@::: @:: :: :@@:: :@:@@@:::::@#:\n",
" | @::::@ :: :::@ :@:: @@:@ @: ::: ::: @@::: @:: :: :@@:: :@:@@@:::::@#:\n",
" | @::::@ :: :::@ :@:: @@:@ @: ::: ::: @@::: @:: :: :@@:: :@:@@@:::::@#:\n",
" | :@::::@ :: :::@ :@:: @@:@ @: ::: ::: @@::: @:: :: :@@:: :@:@@@:::::@#:\n",
" | ::@::::@ :: :::@ :@:: @@:@ @: ::: ::: @@::: @:: :: :@@:: :@:@@@:::::@#:\n",
" 0 +----------------------------------------------------------------------->Gi\n",
" 0 3.603\n",
"\n",
"Number of snapshots: 75\n",
" Detailed snapshots: [3, 8, 14, 16, 19, 20, 22, 23, 31, 32, 36, 42, 43, 47, 49, 50, 51, 53, 54, 56, 58, 59, 69, 71 (peak)]\n",
"\n",
"--------------------------------------------------------------------------------\n",
" n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)\n",
"--------------------------------------------------------------------------------\n",
" 0 0 0 0 0 0\n",
" 1 67,644,245 3,475,664 3,161,038 314,626 0\n",
" 2 123,861,379 5,395,040 4,962,463 432,577 0\n",
" 3 193,072,538 7,691,664 6,987,949 703,715 0\n",
"...\n",
"--------------------------------------------------------------------------------\n",
" n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)\n",
"--------------------------------------------------------------------------------\n",
" 70 3,765,509,072 34,403,160 31,516,692 2,886,468 0\n",
" 71 3,778,388,676 36,193,752 32,844,014 3,349,738 0\n",
" 72 3,808,176,489 35,776,744 32,788,856 2,987,888 0\n",
" 73 3,837,965,928 35,776,352 32,788,466 2,987,886 0\n",
" 74 3,868,367,424 29,669,456 26,997,450 2,672,006 0\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Working with fixed memory based forks capacity"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"mem_mbytes = 4096\n",
"# 70MB per fork\n",
"per_fork_mbytes = 100"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"def new_capacity(mem_mbytes=mem_mbytes, per_fork_mbytes=per_fork_mbytes):\n",
" if mem_mbytes <= 2048:\n",
" return 1\n",
" return max(1, ( mem_mbytes-2048 ) / per_fork_mbytes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fork Capacity of a 4G system"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"20"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# fork impact == 10\n",
"get_old_capacity(4096) / 10"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"20"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# if per_fork cost is as above\n",
"new_capacity(4096, 100)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# if we reserve 2G of memory for other things\n",
"new_capacity(2048, 100)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...8G system"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"50"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"get_old_capacity(8192) / 10"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"61"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_capacity(8192, 100)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_capacity(2900, 100)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Algorithm Adjustment between Memory and CPU as an upper and lower bound relative to a percentage (represented as a number between 0 and 1)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"half: 13.0\n"
]
}
],
"source": [
"cpu_forks = 8\n",
"mem_forks = 18\n",
"# 50 percent\n",
"half_scale = (mem_forks - cpu_forks) * 0.5\n",
"print(\"half: {}\".format(cpu_forks + half_scale))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Django Shell-Plus",
"language": "python",
"name": "django_extensions"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment