Skip to content

Instantly share code, notes, and snippets.

@bbannier
Created February 5, 2020 20:24
Show Gist options
  • Save bbannier/7c1cbf4d622cd498e7863781b90ea1ba to your computer and use it in GitHub Desktop.
Save bbannier/7c1cbf4d622cd498e7863781b90ea1ba to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Some statistical analysis of flaky tests\n",
"\n",
"We currently seem to be cursed by perpetually red CI state. Hardly any test run seems to pass without any failure. Since our CI does not provide a clean signal, it seems in cannot give a lot of convidence in that changes we introduce are safe.\n",
"\n",
"The following presents some statistical analysis on flaky test statistics we have gathered from our internal CI. As a reminder, our CI runs the `stout`, `libprocess` and `mesos` test suites in variing configurations on a number of platforms.\n",
"\n",
"\n",
"## Setting\n",
"\n",
"Currently our CI executes 38,482 tests from 13 setups, and in any CI run we might see a handful of failures. These numbers already suggest that typically tests do not fail often. To get a feeling for our flake rates we can make some back-of-the-envelope estimations.\n",
"\n",
"We can think of running a test as an experiment with a binary outcome, either the test passes or fails. Let's say that test $i$ fails with a probability $p_i$. We are running the test in $N$ setups, and the outcome in any one setup having no effect on the outcomes on other platforms, i.e., the test runs are statistically independent experiments.\n",
"\n",
"In this approximation, each test run is a [Bernoulli trial](https://en.wikipedia.org/wiki/Bernoulli_trial), and the probability to see $k$ successes (passing test runs) from test $i$ when running it $n$ times is described by a [Binominal distribution](https://en.wikipedia.org/wiki/Binomial_distribution),\n",
"\n",
"$$f(k; n, p_i) = {n\\choose k} p_i^k (1-p_i)^{n-k}$$\n",
"\n",
"We take $n$ as the number of setups; for the case where the test never fails in a single test run we are interested in $k=0$ and we are left with a success probability\n",
"\n",
"$$f(p_i) = (1-p_i)^{n}$$"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"plt.style.use('seaborn-darkgrid')\n",
"\n",
"p = np.linspace(0, 1, 100)\n",
"f = (1 - p)**13\n",
"plt.plot(p, f, label='$n=13$')\n",
"plt.xlabel('$p_i$')\n",
"plt.ylabel('probability of success $f(p_i)$')\n",
"plt.legend();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above plot show the probability for the test to pass in a single test run for all 13 setups. We can see that the probability drops rapidly as the probability for the test to fail in isolation increases, e.g., if the test fails only in 5% of the cases, the probability to not see this test fail in any one of 13 setups is only 50%.\n",
"\n",
"As an added challenge, we not only run the test in parallel in differnt setups, but also repeatedly for new source code revision. Again, experiements from different revisions are independent (assuming for simplicity that for some stretches in time no change affecting the test in question was made), and we can redo above calculation not for $n$ set to the number of setups, but now $n=n_\\text{setups} \\times n_\\text{runs}$, e.g., for 10 consecutive runs, and see that it drops even faster,"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"f = (1 - p)**130\n",
"plt.plot(p, f, label='$n=130$')\n",
"plt.xlabel('$p_i$')\n",
"plt.ylabel('probability of success $f(p_i)$')\n",
"plt.legend();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To answer the question how likely we are to see $m$ consecutive failure free runs when executing it on 13 platforms, we can look at the cummulative distribution which tells us the probability to see $m$ failure-free runs in a row. We see that even if a test only fails 0.5% of the time, the probility to see more failure free runs drops below 50% after already 12 runs."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import scipy.stats\n",
"\n",
"k = np.arange(0, 31)\n",
"n = 13\n",
"\n",
"plt.plot(k, scipy.stats.binom.cdf(0, k*n, 0.005), label='$p_i=0.005$')\n",
"plt.plot(k, scipy.stats.binom.cdf(0, k*n, 0.01), label='$p_i=0.01$')\n",
"plt.plot(k, scipy.stats.binom.cdf(0, k*n, 0.05), label='$p_i=0.05$')\n",
"plt.xlabel('number of runs on 13 setups $m$')\n",
"plt.ylabel('probability of success')\n",
"plt.legend();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since we not only execute a single test, but many thousands of test, we only need a very small average failure probability to always observe at least on single failure (a red CI status)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data\n",
"\n",
"Since we have only limited Jenkins history we use CI results posted to Slack as a store. This is problematic since we only learn about tests when its first failure is reported which makes it hard to determine points when tests became flaky. Additionally, the message format we use does not denote on which platform a test fail; this is problematic when a test suite aborted and reports no results on tests. In this situation we cannot tell whether a test failed on the setup we observed the abort on.\n",
"\n",
"I use a Slack history imported by the tool already used by Benno."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Filter messages from CI bot from Slack history and store report data in a list.\n",
"\n",
"import json\n",
"\n",
"j = json.load(open('messages_core-ci.json'))\n",
"j = [J for J in j if 'bot_id' in J and J['bot_id'] == 'B2HPH9RGB']\n",
"j = [J for J in j if 'attachments' in J and J['attachments']]\n",
"j = [J for J in j if 'fields' in J['attachments'][0] and J['attachments'][0]['fields']]\n",
"j = [J for J in j if 'value' in J['attachments'][0]['fields'][0]]\n",
"\n",
"failures = []\n",
"\n",
"for J in reversed(j):\n",
" lines = J['attachments'][0]['fields'][0]['value'].split('\\n')\n",
"\n",
" branch = '-'.join(lines[0].split()[4].split('-'))\n",
" \n",
" failed_tests = []\n",
" if len(lines) < 4 or 'Failed Tests' not in lines[3]:\n",
" continue\n",
" for l in lines[4:]:\n",
" if not l:\n",
" break\n",
" test = l.split()[0]\n",
" failed_tests.append(test)\n",
" \n",
" # Estimate the number of test setups from the number of tests and the number of aborts.\n",
" l = list(map(lambda x: x.split(',')[0], lines[2].split(': ')))\n",
" num_passed, num_failed = int(l[1]), int(l[2])\n",
" num_empty = failed_tests.count('[empty]')\n",
" num_setups = int(np.round((num_passed + num_failed)/2650 + num_empty))\n",
"\n",
" if not failed_tests:\n",
" failed_tests.append(None)\n",
" \n",
" for test in failed_tests:\n",
" failures.append({'branch':branch, 'test':test, 'setups': num_setups, 'passed': num_passed, 'failed': num_failed, 'empty': num_empty})"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>branch</th>\n",
" <th>empty</th>\n",
" <th>failed</th>\n",
" <th>passed</th>\n",
" <th>setups</th>\n",
" <th>test</th>\n",
" <th>build</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>#4921-asf/master-0e709d31</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>38409</td>\n",
" <td>14</td>\n",
" <td>ROOT_CGROUPS_Stat</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>#4913-asf/master-e65ab60b</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>38658</td>\n",
" <td>16</td>\n",
" <td>SimpleEviction</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>#4913-asf/master-e65ab60b</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>38658</td>\n",
" <td>16</td>\n",
" <td>LocalStoreTestWithTar</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>#4913-asf/master-e65ab60b</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>38658</td>\n",
" <td>16</td>\n",
" <td>ROOT_CGROUPS_CFS_EnableCfs</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>#4913-asf/master-e65ab60b</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>38658</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>#4912-asf/master-9bba87b9</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38660</td>\n",
" <td>16</td>\n",
" <td>GroupPathWithRestrictivePerms</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>#4912-asf/master-9bba87b9</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38660</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td>#4909-asf/master-bf4e8b39</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38660</td>\n",
" <td>16</td>\n",
" <td>HttpCachedConcurrent</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>#4909-asf/master-bf4e8b39</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38660</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>#4905-asf/master-066e0f81</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38660</td>\n",
" <td>16</td>\n",
" <td>NonRetryableFrrors</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>#4905-asf/master-066e0f81</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38660</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>#4904-asf/master-81b179fe</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>38661</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>76</th>\n",
" <td>#4901-asf/master-f24736d4</td>\n",
" <td>2</td>\n",
" <td>4</td>\n",
" <td>36062</td>\n",
" <td>16</td>\n",
" <td>PartitionAwareTaskCompletedOnPartitionedAgent</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>77</th>\n",
" <td>#4901-asf/master-f24736d4</td>\n",
" <td>2</td>\n",
" <td>4</td>\n",
" <td>36062</td>\n",
" <td>16</td>\n",
" <td>ExecutorMessageToRecoveredHttpFramework</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>78</th>\n",
" <td>#4901-asf/master-f24736d4</td>\n",
" <td>2</td>\n",
" <td>4</td>\n",
" <td>36062</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>79</th>\n",
" <td>#4901-asf/master-f24736d4</td>\n",
" <td>2</td>\n",
" <td>4</td>\n",
" <td>36062</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>158</th>\n",
" <td>#4888-asf/master-84a6eef1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38648</td>\n",
" <td>16</td>\n",
" <td>Auth</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>159</th>\n",
" <td>#4888-asf/master-84a6eef1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38648</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>174</th>\n",
" <td>#4885-asf/master-68cf4372</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>38647</td>\n",
" <td>16</td>\n",
" <td>ROOT_CGROUPS_CFS_EnableCfs</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>175</th>\n",
" <td>#4885-asf/master-68cf4372</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>38647</td>\n",
" <td>16</td>\n",
" <td>ROOT_CGROUPS_Stat</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>176</th>\n",
" <td>#4885-asf/master-68cf4372</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>38647</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>180</th>\n",
" <td>#4883-asf/master-9b889a10</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>38649</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>181</th>\n",
" <td>#4882-asf/master-8c6e0e57</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38480</td>\n",
" <td>16</td>\n",
" <td>ROOT_CGROUPS_Stat</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>182</th>\n",
" <td>#4882-asf/master-8c6e0e57</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>38480</td>\n",
" <td>16</td>\n",
" <td>[empty]</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>185</th>\n",
" <td>#4881-asf/master-0f3455cc</td>\n",
" <td>0</td>\n",
" <td>65</td>\n",
" <td>38585</td>\n",
" <td>15</td>\n",
" <td>ROOT_INTERNET_CURL_HealthyTaskViaHTTPWithConta...</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>186</th>\n",
" <td>#4881-asf/master-0f3455cc</td>\n",
" <td>0</td>\n",
" <td>65</td>\n",
" <td>38585</td>\n",
" <td>15</td>\n",
" <td>ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithCont...</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>187</th>\n",
" <td>#4881-asf/master-0f3455cc</td>\n",
" <td>0</td>\n",
" <td>65</td>\n",
" <td>38585</td>\n",
" <td>15</td>\n",
" <td>ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithCont...</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>188</th>\n",
" <td>#4881-asf/master-0f3455cc</td>\n",
" <td>0</td>\n",
" <td>65</td>\n",
" <td>38585</td>\n",
" <td>15</td>\n",
" <td>ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithCont...</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>189</th>\n",
" <td>#4881-asf/master-0f3455cc</td>\n",
" <td>0</td>\n",
" <td>65</td>\n",
" <td>38585</td>\n",
" <td>15</td>\n",
" <td>ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithCont...</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>190</th>\n",
" <td>#4881-asf/master-0f3455cc</td>\n",
" <td>0</td>\n",
" <td>65</td>\n",
" <td>38585</td>\n",
" <td>15</td>\n",
" <td>ROOT_DOCKER_DockerHealthyTask</td>\n",
" <td>11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10753</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>INTERNET_CURL_InvokeFetchByName</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10754</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>INTERNET_CURL_FetchManifest</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10755</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>INTERNET_CURL_FetchBlob</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10756</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>INTERNET_CURL_FetchImage</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10757</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>INTERNET_CURL_InvokeFetchByName</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10758</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>INTERNET_CURL_FetchManifest</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10759</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>INTERNET_CURL_FetchBlob</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10760</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>INTERNET_CURL_FetchImage</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10761</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>INTERNET_CURL_InvokeFetchByName</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10762</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_ScratchImage</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10763</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_ImageDigest</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10764</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_CommandTaskUser</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10765</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_ScratchImage</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10766</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_ImageDigest</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10767</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_CommandTaskUser</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10768</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_ScratchImage</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10769</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_ImageDigest</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10770</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_CommandTaskUser</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10771</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_LaunchCommandTask</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10772</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL_LaunchContainerInHostNetwork</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10773</th>\n",
" <td>#3401-asf/master-81d4b20d</td>\n",
" <td>0</td>\n",
" <td>93</td>\n",
" <td>33476</td>\n",
" <td>13</td>\n",
" <td>ROOT_INTERNET_CURL…</td>\n",
" <td>488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10873</th>\n",
" <td>#3392-asf/master-40b40d9b</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>12085</td>\n",
" <td>5</td>\n",
" <td>None</td>\n",
" <td>489</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10874</th>\n",
" <td>#3391-asf/master-eb9ab808</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>15141</td>\n",
" <td>6</td>\n",
" <td>None</td>\n",
" <td>490</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10875</th>\n",
" <td>#3389-asf/master-8c25ef96</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>15141</td>\n",
" <td>6</td>\n",
" <td>None</td>\n",
" <td>491</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10899</th>\n",
" <td>#3373-asf/master-729ec620</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>45751</td>\n",
" <td>17</td>\n",
" <td>ResourceStatistics</td>\n",
" <td>492</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10900</th>\n",
" <td>#3373-asf/master-729ec620</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>45751</td>\n",
" <td>17</td>\n",
" <td>EndpointCreateThenOfferRemove</td>\n",
" <td>492</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10901</th>\n",
" <td>#3373-asf/master-729ec620</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>45751</td>\n",
" <td>17</td>\n",
" <td>AttachInputToNestedContainerSession/1</td>\n",
" <td>492</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10903</th>\n",
" <td>#3369-asf/master-90ce7976</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>42989</td>\n",
" <td>17</td>\n",
" <td>ROOT_INTERNET_CURL_HealthyTaskViaHTTPWithConta...</td>\n",
" <td>493</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10904</th>\n",
" <td>#3369-asf/master-90ce7976</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>42989</td>\n",
" <td>17</td>\n",
" <td>[empty]</td>\n",
" <td>493</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10906</th>\n",
" <td>#3371-asf/master-20c3c34b</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>45753</td>\n",
" <td>17</td>\n",
" <td>RecoverNestedContainer/15</td>\n",
" <td>494</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>2800 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" branch empty failed passed setups \\\n",
"2 #4921-asf/master-0e709d31 0 1 38409 14 \n",
"32 #4913-asf/master-e65ab60b 1 4 38658 16 \n",
"33 #4913-asf/master-e65ab60b 1 4 38658 16 \n",
"34 #4913-asf/master-e65ab60b 1 4 38658 16 \n",
"35 #4913-asf/master-e65ab60b 1 4 38658 16 \n",
"36 #4912-asf/master-9bba87b9 1 2 38660 16 \n",
"37 #4912-asf/master-9bba87b9 1 2 38660 16 \n",
"44 #4909-asf/master-bf4e8b39 1 2 38660 16 \n",
"45 #4909-asf/master-bf4e8b39 1 2 38660 16 \n",
"54 #4905-asf/master-066e0f81 1 2 38660 16 \n",
"55 #4905-asf/master-066e0f81 1 2 38660 16 \n",
"56 #4904-asf/master-81b179fe 1 1 38661 16 \n",
"76 #4901-asf/master-f24736d4 2 4 36062 16 \n",
"77 #4901-asf/master-f24736d4 2 4 36062 16 \n",
"78 #4901-asf/master-f24736d4 2 4 36062 16 \n",
"79 #4901-asf/master-f24736d4 2 4 36062 16 \n",
"158 #4888-asf/master-84a6eef1 1 2 38648 16 \n",
"159 #4888-asf/master-84a6eef1 1 2 38648 16 \n",
"174 #4885-asf/master-68cf4372 1 3 38647 16 \n",
"175 #4885-asf/master-68cf4372 1 3 38647 16 \n",
"176 #4885-asf/master-68cf4372 1 3 38647 16 \n",
"180 #4883-asf/master-9b889a10 1 1 38649 16 \n",
"181 #4882-asf/master-8c6e0e57 1 2 38480 16 \n",
"182 #4882-asf/master-8c6e0e57 1 2 38480 16 \n",
"185 #4881-asf/master-0f3455cc 0 65 38585 15 \n",
"186 #4881-asf/master-0f3455cc 0 65 38585 15 \n",
"187 #4881-asf/master-0f3455cc 0 65 38585 15 \n",
"188 #4881-asf/master-0f3455cc 0 65 38585 15 \n",
"189 #4881-asf/master-0f3455cc 0 65 38585 15 \n",
"190 #4881-asf/master-0f3455cc 0 65 38585 15 \n",
"... ... ... ... ... ... \n",
"10753 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10754 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10755 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10756 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10757 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10758 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10759 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10760 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10761 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10762 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10763 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10764 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10765 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10766 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10767 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10768 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10769 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10770 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10771 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10772 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10773 #3401-asf/master-81d4b20d 0 93 33476 13 \n",
"10873 #3392-asf/master-40b40d9b 0 0 12085 5 \n",
"10874 #3391-asf/master-eb9ab808 0 0 15141 6 \n",
"10875 #3389-asf/master-8c25ef96 0 0 15141 6 \n",
"10899 #3373-asf/master-729ec620 0 3 45751 17 \n",
"10900 #3373-asf/master-729ec620 0 3 45751 17 \n",
"10901 #3373-asf/master-729ec620 0 3 45751 17 \n",
"10903 #3369-asf/master-90ce7976 1 2 42989 17 \n",
"10904 #3369-asf/master-90ce7976 1 2 42989 17 \n",
"10906 #3371-asf/master-20c3c34b 0 1 45753 17 \n",
"\n",
" test build \n",
"2 ROOT_CGROUPS_Stat 0 \n",
"32 SimpleEviction 1 \n",
"33 LocalStoreTestWithTar 1 \n",
"34 ROOT_CGROUPS_CFS_EnableCfs 1 \n",
"35 [empty] 1 \n",
"36 GroupPathWithRestrictivePerms 2 \n",
"37 [empty] 2 \n",
"44 HttpCachedConcurrent 3 \n",
"45 [empty] 3 \n",
"54 NonRetryableFrrors 4 \n",
"55 [empty] 4 \n",
"56 [empty] 5 \n",
"76 PartitionAwareTaskCompletedOnPartitionedAgent 6 \n",
"77 ExecutorMessageToRecoveredHttpFramework 6 \n",
"78 [empty] 6 \n",
"79 [empty] 6 \n",
"158 Auth 7 \n",
"159 [empty] 7 \n",
"174 ROOT_CGROUPS_CFS_EnableCfs 8 \n",
"175 ROOT_CGROUPS_Stat 8 \n",
"176 [empty] 8 \n",
"180 [empty] 9 \n",
"181 ROOT_CGROUPS_Stat 10 \n",
"182 [empty] 10 \n",
"185 ROOT_INTERNET_CURL_HealthyTaskViaHTTPWithConta... 11 \n",
"186 ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithCont... 11 \n",
"187 ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithCont... 11 \n",
"188 ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithCont... 11 \n",
"189 ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithCont... 11 \n",
"190 ROOT_DOCKER_DockerHealthyTask 11 \n",
"... ... ... \n",
"10753 INTERNET_CURL_InvokeFetchByName 488 \n",
"10754 INTERNET_CURL_FetchManifest 488 \n",
"10755 INTERNET_CURL_FetchBlob 488 \n",
"10756 INTERNET_CURL_FetchImage 488 \n",
"10757 INTERNET_CURL_InvokeFetchByName 488 \n",
"10758 INTERNET_CURL_FetchManifest 488 \n",
"10759 INTERNET_CURL_FetchBlob 488 \n",
"10760 INTERNET_CURL_FetchImage 488 \n",
"10761 INTERNET_CURL_InvokeFetchByName 488 \n",
"10762 ROOT_INTERNET_CURL_ScratchImage 488 \n",
"10763 ROOT_INTERNET_CURL_ImageDigest 488 \n",
"10764 ROOT_INTERNET_CURL_CommandTaskUser 488 \n",
"10765 ROOT_INTERNET_CURL_ScratchImage 488 \n",
"10766 ROOT_INTERNET_CURL_ImageDigest 488 \n",
"10767 ROOT_INTERNET_CURL_CommandTaskUser 488 \n",
"10768 ROOT_INTERNET_CURL_ScratchImage 488 \n",
"10769 ROOT_INTERNET_CURL_ImageDigest 488 \n",
"10770 ROOT_INTERNET_CURL_CommandTaskUser 488 \n",
"10771 ROOT_INTERNET_CURL_LaunchCommandTask 488 \n",
"10772 ROOT_INTERNET_CURL_LaunchContainerInHostNetwork 488 \n",
"10773 ROOT_INTERNET_CURL… 488 \n",
"10873 None 489 \n",
"10874 None 490 \n",
"10875 None 491 \n",
"10899 ResourceStatistics 492 \n",
"10900 EndpointCreateThenOfferRemove 492 \n",
"10901 AttachInputToNestedContainerSession/1 492 \n",
"10903 ROOT_INTERNET_CURL_HealthyTaskViaHTTPWithConta... 493 \n",
"10904 [empty] 493 \n",
"10906 RecoverNestedContainer/15 494 \n",
"\n",
"[2800 rows x 7 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Create a dataframe containing history of failures of `asf/master` builds.\n",
"\n",
"import pymc3 as pm\n",
"import pandas as pd\n",
"\n",
"df = pd.DataFrame.from_dict(failures)\n",
"\n",
"# Filter out builds for the `master` branch.\n",
"master = df.branch.str.contains('asf/master-')\n",
"df = df[master]\n",
"\n",
"# Add a `build` column which indexes builds of `asf/master`.\n",
"df['build'] = df.branch.map({b: i for i, b in enumerate(df.groupby('branch', sort=False).groups.keys())})\n",
"\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Auto-assigning NUTS sampler...\n",
"Initializing NUTS using jitter+adapt_diag...\n",
"Multiprocess sampling (2 chains in 10 jobs)\n",
"NUTS: [rate]\n",
"Sampling 2 chains: 100%|██████████| 3000/3000 [00:15<00:00, 197.46draws/s]\n"
]
}
],
"source": [
"NSETUPS = 18\n",
"\n",
"def analyze(test, data): \n",
" build = np.arange(data.build.min(), df.build.max() + 1)\n",
" fails = np.zeros(len(build))\n",
" setups = np.zeros(len(build))\n",
" for b, xs in data.groupby('build'):\n",
" idx = np.where(build == b)[0][0] \n",
" fails[idx] = len(xs)\n",
" setups[idx] = xs.setups.iloc[0]\n",
"\n",
" cutoff = 500\n",
" build = build[-cutoff:]\n",
" fails = fails[-cutoff:]\n",
" setups = setups[-cutoff:]\n",
"\n",
" mask = fails <= NSETUPS # FIXME: set at max\n",
" # mask = fails <= NSETUPS*1000\n",
" unmask = np.logical_not(mask)\n",
" \n",
" build = build.astype(int)\n",
" fails = fails.astype(int)\n",
" setups = setups.astype(int)\n",
"\n",
" ########\n",
"\n",
" with pm.Model() as model:\n",
" # switchpoint = pm.DiscreteUniform('switchpoint', lower=build.min(), upper=build.max(), testval=build.mean())\n",
"\n",
" # early_rate = pm.Uniform('early_rate', lower=0., upper=1., testval=.5)\n",
" # late_rate = pm.Uniform('late_rate', lower=0., upper=1., testval=1e-12)\n",
" # rate = pm.math.switch(switchpoint >= build[mask], early_rate, late_rate)\n",
" BGaussianRandomWalk = pm.Bound(pm.GaussianRandomWalk, lower=0., upper=1.)\n",
" # rate = BGaussianRandomWalk('rate', sd=1, shape=len(build[mask]))\n",
" rate = BGaussianRandomWalk('rate', tau=10, shape=len(build[mask]), testval=1e-12)\n",
" \n",
" #failures = pm.Binomial('failures', p=rate, n=setups[mask], observed=fails[mask])\n",
" failures = pm.Binomial('failures', p=rate, n=18, observed=fails[mask])\n",
" \n",
" #trace = pm.sample(5000, chains=2, cores=10, nuts_kwargs=dict(target_accept=.95))\n",
" trace = pm.sample(sample=40000, tune=1000, chains=2, cores=10, nuts_kwargs=dict(target_accept=.95))\n",
" \n",
" if 0:\n",
" switch = trace['switchpoint']\n",
" switch_hpd = pm.stats.hpd(switch)\n",
" \n",
" early = trace['early_rate']\n",
" early_hpd = pm.stats.hpd(early)\n",
"\n",
" late = trace['late_rate']\n",
" late_hpd = pm.stats.hpd(late)\n",
" \n",
" rate = trace['rate']\n",
" rate_hpd = pm.stats.hpd(rate)\n",
" \n",
" return {\n",
" 'test': test,\n",
" 'build_min': build.min(),\n",
" 'build_max': build.max(),\n",
" 'all': [build, fails],\n",
" 'accepted': [build[mask], fails[mask]],\n",
" 'filtered': [build[unmask], fails[unmask]],\n",
" 'trace': trace,\n",
" 'model': model,\n",
" 'switch': 0, #switch.mean(),\n",
" 'switch_min': 0,#switch_hpd[0],\n",
" 'switch_max': 0,#switch_hpd[1],\n",
" 'early': rate.mean(), #early.mean(),\n",
" 'early_min': rate.mean(), #early_hpd[0],\n",
" 'early_max': rate.mean(), #early_hpd[1],\n",
" 'late': rate.mean(), #late.mean(),\n",
" 'late_min': rate_hpd[0], #late_hpd[0],\n",
" 'late_max': rate_hpd[1], #late_hpd[1],\n",
" }\n",
"\n",
"# summarize(analyze('', df.groupby('test').get_group('[empty]')))\n",
"# summarize(analyze('', df.groupby('test').get_group('LOGROTATE_CustomRotateOptions')))\n",
"t = analyze('', df.groupby('test').get_group('AgentRegi'))\n",
"# summarize(t)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/bbannier/src/cistat/_venv/lib/python3.7/site-packages/matplotlib/axes/_base.py:3604: MatplotlibDeprecationWarning: \n",
"The `ymin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `bottom` instead.\n",
" alternative='`bottom`', obj_type='argument')\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x144 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"r = t['trace']['rate'].T\n",
"rm = np.mean(r, axis=1)\n",
"rmin, rmax = pm.stats.hpd(r.T, alpha=0.05).T\n",
"\n",
"plt.plot(rm, label='rate')\n",
"plt.plot(np.median(r, axis=1), label='median rate')\n",
"plt.plot(rmin, label='rate-')\n",
"plt.plot(rmax, label='rate+')\n",
"\n",
"plt.legend()\n",
"# pm.traceplot(t['trace'])\n",
"plt.figure()\n",
"summarize(t)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"TEST = 'LOGROTATE_CustomRotateOptions' # Good example for switch\n",
"# TEST = 'Used'\n",
"\n",
"# TEST = 'PartitionAwareTaskCompletedOnPartitionedAgent' # Good example for no switch"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"results = {}\n",
"test = df.groupby('test').get_group(TEST)\n",
"for test, data in df.groupby('test', sort=False):\n",
" print(\"Analyzing {}\".format(test))\n",
" result = analyze(test, data)\n",
" # summarize(result)\n",
" results[test] = result\n",
"\n",
"# Store results.\n",
"import pickle\n",
"with open('results_.pickle', 'wb') as f:\n",
" pickle.dump(results, f, protocol=pickle.HIGHEST_PROTOCOL)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# results = pickle.load(open('results_.pickle', 'rb'))\n",
"\n",
"# Create a dataframe containing summarized information\n",
"rs = {}\n",
"\n",
"for test, result in results.items():\n",
" r = dict(result)\n",
" for col in ['all', 'accepted', 'filtered', 'trace', 'model']:\n",
" r.pop(col)\n",
" rs[test] = r\n",
"\n",
"rs = pd.DataFrame.from_dict(rs, orient='index')\n",
"\n",
"# Add a `better` column measuring wheter a test's failure rate improved. The\n",
"# value is the range [0, 1] with higher values denoting tests failing less.\n",
"phi = np.arcsin(rs.early/np.sqrt(rs.early**2 + rs.late**2))\n",
"rs['late_d'] = (rs.late_max - rs.late_min)\n",
"rs[rs.late_d<0.5].sort_values(by='late_d', ascending=True)[['switch', 'switch_min', 'switch_max', 'late_d', 'early', 'early_min', 'early_max', 'late', 'late_min', 'late_max']]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def summarize(result):\n",
" plt.plot(*result['accepted'], '.', alpha=0.2)\n",
" plt.plot(*result['filtered'], '.', alpha=0.2)\n",
" plt.xlabel('build')\n",
" plt.ylabel('number of failures')\n",
"\n",
" fails_max = result['all'][1].max()\n",
" \n",
" plt.vlines(result['switch'], 0, fails_max, alpha=0.5, color='orange')\n",
" plt.fill_betweenx([0, fails_max], result['switch_min'], result['switch_max'], alpha=0.2, color='orange')\n",
"\n",
" plt.plot([result['build_min'], result['switch']], [result['early']*NSETUPS, result['early']*NSETUPS], alpha=0.5, color='gray')\n",
" plt.fill_between([result['build_min'], result['switch']], result['early_min']*NSETUPS, result['early_max']*NSETUPS, alpha=0.2, color='gray')\n",
"\n",
" plt.plot([result['switch'], result['build_max']], [result['late']*NSETUPS, result['late']*NSETUPS], alpha=0.5, color='gray')\n",
" plt.fill_between([result['switch'], result['build_max']], result['late_min']*NSETUPS, result['late_max']*NSETUPS, alpha=0.2, color='gray');\n",
"\n",
" pm.traceplot(result['trace'])\n",
"\n",
" from IPython.display import display, HTML\n",
" # display(HTML(pm.summary(result['trace']).round(3).to_html()))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"summarize(results['ROOT_Metrics'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sns.scatterplot(x='late', y='late_z', hue='early', data=rs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# ppc = pm.sample_ppc(results['ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage']['trace'], samples=100, model=results['ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage']['model'], vars='failures')\n",
"# ppc.keys()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"t = results['ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage']\n",
"s = pm.sample_ppc(t['trace'], model=t['model'], samples=1)\n",
"b = result['all'][0]\n",
"# result\n",
"# len(s['failures'][0]), len(b)\n",
"summarize(t)\n",
"pm.plot_posterior(t['trace']);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"summarize(results['ResourceStatistics'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Are tests getting more flaky?\n",
"\n",
"Since we only know about a test's existance after it has failed once, we cannot examine if initially non-flaky tests become more flaky. We can however check if a test started to fail more after some time."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import seaborn as sns\n",
"\n",
"rs['switch_'] = rs.switch - rs.build_min\n",
"rs['dbetter'] = rs.better_sd/rs.better\n",
"sns.scatterplot(x='switch_', y='dbetter', hue='late', size=1/rs.dbetter, data=rs)\n",
"plt.semilogy()\n",
"plt.plot([0, rs.build_max.max()], [1, 1], alpha=0.5);\n",
"\n",
"plt.figure()\n",
"sns.scatterplot(x='switch_', y='better', hue='late', size=np.log(rs.better), data=rs);\n",
"plt.semilogy()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.groupby('test').get_group('Used')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pm.GaussianRandomWalk?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s = df[['build', 'setups']].drop_duplicates()\n",
"plt.plot(s.build, s.setups)\n",
"np.median(s.setups)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pm.GaussianRandomWalk?"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAD0CAYAAAC7KMweAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAADgNJREFUeJzt23+M34Vdx/Hn9a60qT12090KRLO6BN+gApsuGw6BM7AwnMOYmEhKNIM5JY0icy4LndnEKdkPWNRFfnchRnRoMoKCBLYsODpGXIoZjo13AakSobWru3K1v2h7/vH9lpzH5/u9L/Dt99M3PB9Jk7vP58t9X3xzffZzn7sbm5+fR5JUx7K2B0iSXh7DLUnFGG5JKsZwS1IxhluSijHcklTMxCieZMeOuZH9zOHq1SvYvXv/qJ5uaKruBre3xe3tGOX26enJsabjr7kr7omJ8bYnvCJVd4Pb2+L2dhwL219z4Zak1zrDLUnFGG5JKsZwS1IxhluSijHcklSM4ZakYgy3JBVjuCWpGMMtScUYbkkqxnBLUjGGW5KKMdySVIzhlqRiDLckFWO4JakYwy1JxRhuSSrGcEtSMYZbkoox3JJUjOGWpGIMtyQVMzHIgyLiEeD57rtPZ+alC859GLi4++4/ZebVw50oSVpoyXBHxEpgLDNnGs69FbgEeBdwGNgUEXdm5qPDHipJ6hjkivsMYFVE3N99/IbMfLh77hngvZl5CCAilgP7jspSSRIwWLj3ANcCtwInA/dGRGTmwcx8Afh+RIwBnwP+NTO3HL25kqRBwr0FeDIz54EtEbETOJHO1faRWylfBOaA9U0fYPXqFUxMjA9n8RLGx5cxNbVqJM81TFV3g9vb4vZ2HAvbBwn3ZcBpwPqIOAk4HngOoHulfRfwtcz8TK8PsHv3/iFMHczU1CpmZ/eM7PmGpepucHtb3N6OUW6fnp5sPD5IuDcCt0XEJmCeTsiviIgngXHgXGBFRFzYffxVmfnNVz9ZktRkyXBn5gFg3aLDDy14e+VQF0mS+vIXcCSpGMMtScUYbkkqxnBLUjGGW5KKMdySVIzhlqRiDLckFWO4JakYwy1JxRhuSSrGcEtSMYZbkoox3JJUjOGWpGIMtyQVY7glqRjDLUnFGG5JKsZwS1IxhluSijHcklSM4ZakYgy3JBVjuCWpGMMtScUYbkkqxnBLUjGGW5KKMdySVIzhlqRiDLckFTMxyIMi4hHg+e67T2fmpYvOTwPfAE7PzH3DnShJWmjJcEfESmAsM2d6nL8A+DRwwnCnSZKaDHLFfQawKiLu7z5+Q2Y+vOD8YeB8YPNR2CdJWmRsfn6+7wMi4jTgTOBW4GTgXiAy8+Cix20FTmm6VbJ374H5iYnxIU3ub3x8GYcOHR7Jcw1T1d3g9ra4vR2j3L58+fhY0/FBrri3AE9m5jywJSJ2AicCzwz65Lt37x/0oa/a1NQqZmf3jOz5hqXqbnB7W9zejlFun56ebDw+yE+VXAZcBxARJwHHA88NbZkk6WUZJNwbgamI2ATcQSfkV0TERUd1mSSp0ZK3SjLzALBu0eGHGh63dkibJEl9+As4klSM4ZakYgy3JBVjuCWpGMMtScUYbkkqxnBLUjGGW5KKMdySVIzhlqRiDLckFWO4JakYwy1JxRhuSSrGcEtSMYZbkoox3JJUjOGWpGIMtyQVY7glqRjDLUnFGG5JKsZwS1IxhluSijHcklSM4ZakYgy3JBVjuCWpGMMtScUYbkkqxnBLUjETgzwoIh4Bnu+++3RmXrrg3IeA3wYOAn+SmXcPfaUk6UVLhjsiVgJjmTnTcO4E4ArgHcBKYFNEfCUz9w97qDRM935vO9c/uJXtc/tZM7mC9Wev5cJT17Q9SxrIIFfcZwCrIuL+7uM3ZObD3XPvBL7RDfX+iHgSOB341lFZKw3Bvd/bzjX3P8G+g4cB2Da3n2vufwLAeKuEQe5x7wGuBS4ALgduj4gjwT8e2LXgsXPAG4a6UBqy6x/c+mK0j9h38DDXP7i1nUHSyzTIFfcW4MnMnAe2RMRO4ETgGTr3vScXPHYSmF38AVavXsHExPgQ5i5tfHwZU1OrRvJcw1R1N9Tbvn2u+U7e9rn9pf4/qr3uC7n91Rkk3JcBpwHrI+IkOlfZz3XP/Qvwp9374CuAU4HvLP4Au3eP7pb31NQqZmf3jOz5hqXqbqi3fc3kCrY1xHvN5IpS/x/VXveF3D6Y6enJxuOD3CrZCExFxCbgDjohvyIiLsrMbcBfAA8CXwM+npn7hjNZOjrWn72WlRP//1N/5cQy1p+9tp1B0su05BV3Zh4A1i06/NCC87cAtwx5l3TUHPkGpD9VoqoG+jlu6bXmwlPXcOGpa0p/ya7XL39zUpKKMdySVIzhlqRiDLckFWO4JakYwy1JxRhuSSrGcEtSMYZbkoox3JJUjOGWpGIMtyQVY7glqRjDLUnFGG5JKsZwS1IxhluSijHcklSM4ZakYgy3JBVjuCWpGMMtScUYbkkqxnBLUjGGW5KKMdySVIzhlqRiDLckFWO4JakYwy1JxRhuSSpmYpAHRcSbgc3AezLz8QXHfx34KLALuC0zNx6VlZKkFy15xR0Ry4GbgL2Ljr8J+BQwA5wLXBIRa4c/UZK00CC3Sq4FbgSeXXT8rcC3M/N/MvMw8C3gzCHvkyQt0vdWSUR8ANiRmfdFxFWLTj8B/FRErAHmgPOALU0fZ/XqFUxMjA9h7tLGx5cxNbVqJM81TFV3g9vb4vZ2HAvbx+bn53uejIivA/PdP2+jE+aLMnNb9/z7gY8BO4HtwD2Zedfij7Njx1zvJxmyqalVzM7uGdXTDU3V3eD2tri9HaPcPj09OdZ0vO8Vd2aec+TtiHgAuHxBtCeAnwHOBo4DvgJsGNJeSVIPA/1UyUIRsQ5YnZk3RwTAI8A+4LrM/P6Q90mSFhk43Jk5033z8QXHrgauHvImSVIf/gKOJBVjuCWpGMMtScUYbkkqxnBLUjGGW5KKMdySVIzhlqRiDLckFWO4JakYwy1JxRhuSSrGcEtSMYZbkoox3JJUjOGWpGIMtyQVY7glqRjDLUnFGG5JKsZwS1IxhluSijHcklSM4ZakYgy3JBVjuCWpGMMtScUYbkkqxnBLUjGGW5KKMdySVMzEIA+KiDcDm4H3ZObjC45fAnwEOAR8MTNvOCorJUkvWvKKOyKWAzcBextOXwucD5wFfCQi3jjceZKkxQa5VXItcCPwbMO5R4E3ACuBMWB+eNMkSU36hjsiPgDsyMz7ejzkO3RuoTwG3J2Zs8OdJ0labGx+vvdFckR8nc5V9DzwNmALcFFmbouI04G/A94F7Ab+GvhyZv794o+zd++B+YmJ8aMw/6XGx5dx6NDhkTzXMFXdDW5vi9vbMcrty5ePjzUd7/vNycw858jbEfEAcHlmbuse2kXnvvfezDwUEf8NNN7j3r17/yvZ/IpMTa1idnbPyJ5vWKruBre3xe3tGOX26enJxuMD/VTJQhGxDlidmTdHxE3Apog4ADwF3PZqRkqSljZwuDNzpvvm4wuO3UjnG5eSpBHxF3AkqRjDLUnFGG5JKsZwS1IxhluSijHcklSM4ZakYgy3JBVjuCWpGMMtScUYbkkqxnBLUjGGW5KKMdySVIzhlqRiDLckFWO4JakYwy1JxRhuSSrGcEtSMYZbkoox3JJUjOGWpGIMtyQVY7glqZix+fn5tjdIkl4Gr7glqRjDLUnFGG5JKmai7QGvRkRcBVwEHAdcD2wGvgAcAvYDv5GZ29tb2FvD9m8CNwNjwBPAb2bmwfYW9rZ4e2Zu7B5fB/xuZv5cm/v6aXjdHwHupvOaA9yQmXe0NK+vhu3/CNwCvBEYp/P5/lR7C3tr2P4e4ITu6bXAw5l5cTvreuvRmBuBg8AWOn9PD496V9kr7oiYAd4NnAWcC/wY8Od0wjEDfBn4WFv7+umx/RpgQ2ae1X3Y+9tZ11+P7UTE24EP0vmH55jUY/vPAp/PzJnun2M12jO8dPtngdsz8xzgD4FTWhvYR9P2zLy4+/f0V4BZ4MOtDeyhx2v+SeCPM/PngRXA+9rYVvmK+wLg34A7geOBjwI3ZeZz3fMTwL6Wti2lafunMvNQRBxH50pkV4v7+nnJ9oj4ETr/8FxJ5wrwWNX0un8QiIj4ZTpX3Vdm5lx7E3tq2v63wKMR8VVgK/B7ra3rr2n7EVcDX1jw9/ZY0rT7MPDDETEGTAIvtDGscrjfBLwF+CXgx4F/oHvFERHvBn4HOKe1df01bo+ItwBfpRPtb7c3r6/F2+8Gvgv8PrC3xV2DaHrdPw3cmpmbI+LjdK6o/qC9iT01bV8L/CAzz4+IT9D5CvMTrS3s7SXbI+IUYBo4j2Pwarur6TX/I+Av6XyFswt4oI1hZW+VADuB+zLzQGYmnavr6Yj4NTr3oN6XmTtaXdhb4/bM/I/MPJnO/s+3urC3xdt/FPgJ4AbgS8BPRsSftTmwj6bX/Z7M3Nw9fyfw9tbW9de0fZxOTKBzv/sdbY1bQuPnO/CrwN9k5qFW1/XWtPt24OzMPAX4K+C6NoZVDvcm4L0RMRYRJwE/BFxI50p7JjP/vdV1/TVt3xgRJ3fPz9H5kuxYtHj7fwE/3b1feTHw3cy8ss2BfTS97vdExDu758+j882nY1HT9ruAX+yePwd4rK1xS2javhM4H7i31WX9Ne1+Cni+e/5ZOt8YHrnSvzkZEZ8FfoHOP0Ab6Nzz+0863+wA+OfM/GRL8/pq2D4HfA44AOyh893qY/G+30u2Z+Z93eNrgS9l5pktzuur4XXfQecnkV4AtgG/lZnP9/4I7WnY/jhwK52g7ALWZeYP2lvYW9PnTEQ8BpyVmbP9/+v2NLzm/wt8hs5PlRwAPpSZW0e9q3S4Jen1qPKtEkl6XTLcklSM4ZakYgy3JBVjuCWpGMMtScUYbkkqxnBLUjH/BxNUB4ykev6KAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"d = df.groupby('test').get_group('')\n",
"print(len(d.build))\n",
"plt.plot(d.build, d.failed, 'o--');"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment