Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save HeinrichHartmann/9bbc59ab0e0905fb9b3ab9d0bf72e5af to your computer and use it in GitHub Desktop.
Save HeinrichHartmann/9bbc59ab0e0905fb9b3ab9d0bf72e5af to your computer and use it in GitHub Desktop.
Circonus for Data Science
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Circonus for Data Science\n",
"\n",
"A frequent request we were hearing from many customers is:\n",
"\n",
"> How can I analyze my data with Python?\n",
"\n",
"The python Data Science toolchain (Jupyer/NumPy/pandas) offers a wide spectrum of advanced data anlaytics capabilities.\n",
"Hence seamless integration with this environment is important for our customers that want to make use of those tools.\n",
"\n",
"Circonus has for a long time provided [python bindings](https://github.com/circonus-labs/python-circonusapi) for it's API.\n",
"With this bindings you can configure the account, create graphs and dashboards, etc.\n",
"However, fetching data and getting it into the right format involves multiple steps and was not easy to get right.\n",
"\n",
"We are pleased to announce that this has changed now.\n",
"We have just added a new capabilities to our python bindings, that allow you to fetch and analyze data more effectively.\n",
"Here is how to use it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Quick Tour"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"### SKIP THIS CELL ###\n",
"\n",
"%matplotlib inline\n",
"\n",
"import json\n",
"import os\n",
"from datetime import datetime\n",
"import pandas as pd\n",
"\n",
"with open(os.path.expanduser(\"~/work/.circonusrc.json\"),\"r\") as fh:\n",
" config = json.load(fh)\n",
"api_token = config['demo']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connecting to the API"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You need an API token to connect to the API.\n",
"You can create one using the UI under Integrations > API Tokens.\n",
"In the following we assume the vairable api_token holds a valid API token for your account."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import circonusdata\n",
"circ = circonusdata.CirconusData(api_token)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Searching for Metrics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first thing we can do is search for some metrics:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"M = circ.search('(metric:duration)', limit=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The returned object extends the list class, and can be manipulated like any list object:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CirconusMetric{check_id=195902,name=duration}\n",
"10\n"
]
}
],
"source": [
"print(M[0])\n",
"print(len(M))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Printing the list, gives a table representation of the fetched metrics"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"check_id type metric_name\n",
"--------------------------------------------------\n",
"195902 numeric duration \n",
"218003 numeric duration \n",
"154743 numeric duration \n",
"217833 numeric duration \n",
"217834 numeric duration \n",
"218002 numeric duration \n",
"222857 numeric duration \n",
"222854 numeric duration \n",
"222862 numeric duration \n",
"222860 numeric duration \n"
]
}
],
"source": [
"print(M)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Metric lists provide a .fetch() method that can be used to fetch data.\n",
"Fetches are performed serially, one metric at a time, so the retrival can take some time.\n",
"We will later see how to parallelize fetches with CAQL."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"R = M.fetch(\n",
" start=datetime(2018,1,1), # start at Midnight UTC 2018-01-01\n",
" period=60, # return 60 second (=1min) aggregates\n",
" count=180, # return 180 samples\n",
" kind=\"value\" # return (mean-)value aggregate\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The resulting object is a dict that maps metrics names to the fetched data\n",
"This is designed in such a way that it can be directly passed to a pandas DataFrame constructor."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>154743/duration</th>\n",
" <th>195902/duration</th>\n",
" <th>217833/duration</th>\n",
" <th>217834/duration</th>\n",
" <th>218002/duration</th>\n",
" <th>218003/duration</th>\n",
" <th>222854/duration</th>\n",
" <th>222857/duration</th>\n",
" <th>222860/duration</th>\n",
" <th>222862/duration</th>\n",
" </tr>\n",
" <tr>\n",
" <th>time</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2018-01-01 00:00:00</th>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>12</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-01-01 00:01:00</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-01-01 00:02:00</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-01-01 00:03:00</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>12</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-01-01 00:04:00</th>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>12</td>\n",
" <td>11</td>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 154743/duration 195902/duration 217833/duration \\\n",
"time \n",
"2018-01-01 00:00:00 1 4 1 \n",
"2018-01-01 00:01:00 1 2 1 \n",
"2018-01-01 00:02:00 1 2 1 \n",
"2018-01-01 00:03:00 1 2 1 \n",
"2018-01-01 00:04:00 1 2 1 \n",
"\n",
" 217834/duration 218002/duration 218003/duration \\\n",
"time \n",
"2018-01-01 00:00:00 1 1 1 \n",
"2018-01-01 00:01:00 1 2 1 \n",
"2018-01-01 00:02:00 1 1 1 \n",
"2018-01-01 00:03:00 1 1 1 \n",
"2018-01-01 00:04:00 1 1 1 \n",
"\n",
" 222854/duration 222857/duration 222860/duration \\\n",
"time \n",
"2018-01-01 00:00:00 12 11 12 \n",
"2018-01-01 00:01:00 11 12 12 \n",
"2018-01-01 00:02:00 12 12 11 \n",
"2018-01-01 00:03:00 12 11 12 \n",
"2018-01-01 00:04:00 12 11 11 \n",
"\n",
" 222862/duration \n",
"time \n",
"2018-01-01 00:00:00 1 \n",
"2018-01-01 00:01:00 1 \n",
"2018-01-01 00:02:00 1 \n",
"2018-01-01 00:03:00 1 \n",
"2018-01-01 00:04:00 1 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.DataFrame(R)\n",
"\n",
"# [OPTIONAL] Make the DataFrame aware of the time column\n",
"df['time']=pd.to_datetime(df['time'],unit='s')\n",
"df.set_index('time', inplace=True)\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data Analysis with pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"pandas makes common data analysis methods very easy to perform.\n",
"We start with computing some summary statistics"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>154743/duration</th>\n",
" <th>195902/duration</th>\n",
" <th>217833/duration</th>\n",
" <th>217834/duration</th>\n",
" <th>218002/duration</th>\n",
" <th>218003/duration</th>\n",
" <th>222854/duration</th>\n",
" <th>222857/duration</th>\n",
" <th>222860/duration</th>\n",
" <th>222862/duration</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>180.000000</td>\n",
" <td>180.000000</td>\n",
" <td>180.0</td>\n",
" <td>180.000000</td>\n",
" <td>180.000000</td>\n",
" <td>180.000000</td>\n",
" <td>180.000000</td>\n",
" <td>180.000000</td>\n",
" <td>180.00000</td>\n",
" <td>180.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>1.316667</td>\n",
" <td>2.150000</td>\n",
" <td>1.0</td>\n",
" <td>1.150000</td>\n",
" <td>1.044444</td>\n",
" <td>1.177778</td>\n",
" <td>11.677778</td>\n",
" <td>11.783333</td>\n",
" <td>11.80000</td>\n",
" <td>1.022222</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>1.642573</td>\n",
" <td>0.583526</td>\n",
" <td>0.0</td>\n",
" <td>1.130951</td>\n",
" <td>0.232120</td>\n",
" <td>0.897890</td>\n",
" <td>0.535401</td>\n",
" <td>0.799965</td>\n",
" <td>0.89941</td>\n",
" <td>0.181722</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.0</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.00000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>1.0</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.000000</td>\n",
" <td>11.00000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>1.0</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>12.000000</td>\n",
" <td>12.000000</td>\n",
" <td>12.00000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>1.0</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>12.000000</td>\n",
" <td>12.000000</td>\n",
" <td>12.00000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>15.000000</td>\n",
" <td>4.000000</td>\n",
" <td>1.0</td>\n",
" <td>12.000000</td>\n",
" <td>3.000000</td>\n",
" <td>9.000000</td>\n",
" <td>13.000000</td>\n",
" <td>17.000000</td>\n",
" <td>16.00000</td>\n",
" <td>3.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 154743/duration 195902/duration 217833/duration 217834/duration \\\n",
"count 180.000000 180.000000 180.0 180.000000 \n",
"mean 1.316667 2.150000 1.0 1.150000 \n",
"std 1.642573 0.583526 0.0 1.130951 \n",
"min 1.000000 1.000000 1.0 1.000000 \n",
"25% 1.000000 2.000000 1.0 1.000000 \n",
"50% 1.000000 2.000000 1.0 1.000000 \n",
"75% 1.000000 2.000000 1.0 1.000000 \n",
"max 15.000000 4.000000 1.0 12.000000 \n",
"\n",
" 218002/duration 218003/duration 222854/duration 222857/duration \\\n",
"count 180.000000 180.000000 180.000000 180.000000 \n",
"mean 1.044444 1.177778 11.677778 11.783333 \n",
"std 0.232120 0.897890 0.535401 0.799965 \n",
"min 1.000000 1.000000 11.000000 11.000000 \n",
"25% 1.000000 1.000000 11.000000 11.000000 \n",
"50% 1.000000 1.000000 12.000000 12.000000 \n",
"75% 1.000000 1.000000 12.000000 12.000000 \n",
"max 3.000000 9.000000 13.000000 17.000000 \n",
"\n",
" 222860/duration 222862/duration \n",
"count 180.00000 180.000000 \n",
"mean 11.80000 1.022222 \n",
"std 0.89941 0.181722 \n",
"min 11.00000 1.000000 \n",
"25% 11.00000 1.000000 \n",
"50% 12.00000 1.000000 \n",
"75% 12.00000 1.000000 \n",
"max 16.00000 3.000000 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is a plot of the dataset over time:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7f600687e668>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from matplotlib import pyplot as plt\n",
"ax = df.plot(style=\".\",figsize=(20,5),legend=False, ylim=(0,20), linewidth=0.2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also summarize the individual distributions as box plots:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7f6006382588>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ax = df.plot(figsize=(20,5),legend=False, ylim=(0,20), kind=\"box\")\n",
"ax.figure.autofmt_xdate(rotation=-20,ha=\"left\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Working with Histogram Data\n",
"\n",
"Histogram data can be fetched using the kind=\"histogram\" parameter to fetch.\n",
"Numeric metrics will be converted to histograms.\n",
"Histograms are represented as libcircllhist objects, which have very efficient methods for the most common histogram operations (mean, quantiles)."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"check_id type metric_name\n",
"--------------------------------------------------\n",
"160764 histogram api`GET`/getState \n"
]
}
],
"source": [
"MH = circ.search(\"api`GET`/getState\", limit=1)\n",
"print(MH)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's fetch the 1h latency distributions of this API for the timespan of one day."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"RH = MH.fetch(datetime(2018,1,1), 60*60, 24, kind=\"histogram\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can plot the resulting histograms with a little helper function."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"### SKIP THIS CELL ###\n",
"\n",
"def circllhist_plot(H,**kwargs):\n",
" x=[] # midpoints\n",
" h=[] # height\n",
" w=[] # widths\n",
" for b, c in H:\n",
" x.append(b.midpoint())\n",
" h.append(c / b.width())\n",
" w.append(b.width())\n",
" return plt.bar(x,h,w,**kwargs,)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0, 100)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7f600623a0b8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig = plt.figure(figsize=(20, 5))\n",
"for H in RH['160764/api`GET`/getState']:\n",
" circllhist_plot(H, alpha=0.2)\n",
"ax = fig.get_axes()\n",
"ax[0].set_xlim(0,100)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again we can directly import the data into a pandas data frame, and perform some calculations on the data:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"dfh = pd.DataFrame(RH)\n",
"\n",
"# [OPTIONAL] Make the DataFrame aware of the time column\n",
"dfh['time']=pd.to_datetime(dfh['time'],unit='s')\n",
"dfh.set_index('time', inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>160764/api`GET`/getState</th>\n",
" <th>p99</th>\n",
" <th>p90</th>\n",
" <th>p95</th>\n",
" <th>p50</th>\n",
" <th>mean</th>\n",
" </tr>\n",
" <tr>\n",
" <th>time</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2018-01-01 00:00:00</th>\n",
" <td>{\"+29e-002\": 2, \"+40e-002\": 6, \"+50e-002\": 8, ...</td>\n",
" <td>112.835714</td>\n",
" <td>112.835714</td>\n",
" <td>112.835714</td>\n",
" <td>11.992790</td>\n",
" <td>15.387013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-01-01 01:00:00</th>\n",
" <td>{\"+40e-002\": 2, \"+50e-002\": 2, \"+59e-002\": 5, ...</td>\n",
" <td>114.961628</td>\n",
" <td>114.961628</td>\n",
" <td>114.961628</td>\n",
" <td>16.567822</td>\n",
" <td>19.542284</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-01-01 02:00:00</th>\n",
" <td>{\"+40e-002\": 3, \"+50e-002\": 12, \"+59e-002\": 4,...</td>\n",
" <td>118.124324</td>\n",
" <td>118.124324</td>\n",
" <td>118.124324</td>\n",
" <td>20.556859</td>\n",
" <td>24.012226</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-01-01 03:00:00</th>\n",
" <td>{\"+29e-002\": 1, \"+40e-002\": 7, \"+50e-002\": 21,...</td>\n",
" <td>427.122222</td>\n",
" <td>427.122222</td>\n",
" <td>427.122222</td>\n",
" <td>20.827982</td>\n",
" <td>37.040173</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-01-01 04:00:00</th>\n",
" <td>{\"+40e-002\": 6, \"+50e-002\": 26, \"+59e-002\": 15...</td>\n",
" <td>496.077778</td>\n",
" <td>496.077778</td>\n",
" <td>496.077778</td>\n",
" <td>23.247373</td>\n",
" <td>40.965517</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 160764/api`GET`/getState \\\n",
"time \n",
"2018-01-01 00:00:00 {\"+29e-002\": 2, \"+40e-002\": 6, \"+50e-002\": 8, ... \n",
"2018-01-01 01:00:00 {\"+40e-002\": 2, \"+50e-002\": 2, \"+59e-002\": 5, ... \n",
"2018-01-01 02:00:00 {\"+40e-002\": 3, \"+50e-002\": 12, \"+59e-002\": 4,... \n",
"2018-01-01 03:00:00 {\"+29e-002\": 1, \"+40e-002\": 7, \"+50e-002\": 21,... \n",
"2018-01-01 04:00:00 {\"+40e-002\": 6, \"+50e-002\": 26, \"+59e-002\": 15... \n",
"\n",
" p99 p90 p95 p50 mean \n",
"time \n",
"2018-01-01 00:00:00 112.835714 112.835714 112.835714 11.992790 15.387013 \n",
"2018-01-01 01:00:00 114.961628 114.961628 114.961628 16.567822 19.542284 \n",
"2018-01-01 02:00:00 118.124324 118.124324 118.124324 20.556859 24.012226 \n",
"2018-01-01 03:00:00 427.122222 427.122222 427.122222 20.827982 37.040173 \n",
"2018-01-01 04:00:00 496.077778 496.077778 496.077778 23.247373 40.965517 "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dfh['p99'] = dfh.iloc[:,0].map(lambda h: h.quantile(0.99))\n",
"dfh['p90'] = dfh.iloc[:,0].map(lambda h: h.quantile(0.99))\n",
"dfh['p95'] = dfh.iloc[:,0].map(lambda h: h.quantile(0.99))\n",
"dfh['p50'] = dfh.iloc[:,0].map(lambda h: h.quantile(0.5))\n",
"dfh['mean'] = dfh.iloc[:,0].map(lambda h: h.mean())\n",
"dfh.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The CAQL API\n",
"\n",
"Circonus comes with a wide range of data analysis capabilities that are integrated into the Circonus Analytics Query Language [CAQL](https://login.circonus.com/resources/docs/user/caql_reference.html).\n",
"\n",
"CAQL provides highly efficient data fetching operations that allow you to process multiple metrics at the same time.\n",
"Also by performing the computation close to the data, you can safe time and bandwithd."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To get started we search for `duration` metrics, like we did before, using CAQL:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>output[0]</th>\n",
" <th>output[10]</th>\n",
" <th>output[11]</th>\n",
" <th>output[12]</th>\n",
" <th>output[13]</th>\n",
" <th>output[14]</th>\n",
" <th>output[15]</th>\n",
" <th>output[16]</th>\n",
" <th>output[17]</th>\n",
" <th>output[18]</th>\n",
" <th>...</th>\n",
" <th>output[21]</th>\n",
" <th>output[2]</th>\n",
" <th>output[3]</th>\n",
" <th>output[4]</th>\n",
" <th>output[5]</th>\n",
" <th>output[6]</th>\n",
" <th>output[7]</th>\n",
" <th>output[8]</th>\n",
" <th>output[9]</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" <td>1514764800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>12</td>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1514764860</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2</td>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>12</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" <td>1514764920</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" <td>1514764980</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>11</td>\n",
" <td>12</td>\n",
" <td>1</td>\n",
" <td>1514765040</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 23 columns</p>\n",
"</div>"
],
"text/plain": [
" output[0] output[10] output[11] output[12] output[13] output[14] \\\n",
"0 4 12 1 1 2 1 \n",
"1 2 12 1 1 1 1 \n",
"2 2 11 1 1 2 1 \n",
"3 2 12 1 1 2 1 \n",
"4 2 11 1 1 2 1 \n",
"\n",
" output[15] output[16] output[17] output[18] ... output[21] \\\n",
"0 1 1 11 1 ... 1 \n",
"1 1 1 11 1 ... 1 \n",
"2 1 1 12 1 ... 1 \n",
"3 1 1 12 1 ... 1 \n",
"4 1 1 11 1 ... 1 \n",
"\n",
" output[2] output[3] output[4] output[5] output[6] output[7] \\\n",
"0 1 1 1 1 1 11 \n",
"1 1 1 1 1 2 12 \n",
"2 1 1 1 1 1 12 \n",
"3 1 1 1 1 1 11 \n",
"4 1 1 1 1 1 11 \n",
"\n",
" output[8] output[9] time \n",
"0 12 1 1514764800 \n",
"1 11 1 1514764860 \n",
"2 12 1 1514764920 \n",
"3 12 1 1514764980 \n",
"4 12 1 1514765040 \n",
"\n",
"[5 rows x 23 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"A = circ.caql('search:metric(\"duration\")', datetime(2018,1,1), 60, 5000)\n",
"dfc = pd.DataFrame(A)\n",
"dfc.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This API call fetched 1000 samples from 22 metrics, and completed in just over 1 second.\n",
"The equivalent `circ.search().fetch()` statment would have taken around one minute to complete.\n",
"\n",
"One drawback of the CAQL fetching is, that we use the metric names in the output.\n",
"We are working on resolving this shortcoming."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To showcase some of the analytics features we compute a rolling mean over the second largest duration metric in the above cluster using CAQL,\n",
"and plot the transfomed data using pandas."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f6003f16128>"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7f600620ef28>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"B = circ.caql(\"\"\"\n",
"\n",
"search:metric(\"duration\") | stats:trim(1) | stats:max() | rolling:mean(10M)\n",
"\n",
"\"\"\", datetime(2018,1,1), 60, 1000)\n",
"df = pd.DataFrame(B)\n",
"df['time']=pd.to_datetime(df['time'],unit='s')\n",
"df.set_index('time', inplace=True)\n",
"df.plot(figsize=(20,5), lw=.5,ylim=(0,50))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also fetch histogram data with `circ.caql()`:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>output[0]</th>\n",
" <th>time</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>{\"+29e-002\": 2, \"+40e-002\": 6, \"+50e-002\": 8, ...</td>\n",
" <td>1514764800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>{\"+40e-002\": 2, \"+50e-002\": 2, \"+59e-002\": 5, ...</td>\n",
" <td>1514768400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>{\"+40e-002\": 3, \"+50e-002\": 12, \"+59e-002\": 4,...</td>\n",
" <td>1514772000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>{\"+29e-002\": 1, \"+40e-002\": 7, \"+50e-002\": 21,...</td>\n",
" <td>1514775600</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>{\"+40e-002\": 6, \"+50e-002\": 26, \"+59e-002\": 15...</td>\n",
" <td>1514779200</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" output[0] time\n",
"0 {\"+29e-002\": 2, \"+40e-002\": 6, \"+50e-002\": 8, ... 1514764800\n",
"1 {\"+40e-002\": 2, \"+50e-002\": 2, \"+59e-002\": 5, ... 1514768400\n",
"2 {\"+40e-002\": 3, \"+50e-002\": 12, \"+59e-002\": 4,... 1514772000\n",
"3 {\"+29e-002\": 1, \"+40e-002\": 7, \"+50e-002\": 21,... 1514775600\n",
"4 {\"+40e-002\": 6, \"+50e-002\": 26, \"+59e-002\": 15... 1514779200"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"AH = circ.caql('search:metric:histogram(\"api`GET`/getState\")', datetime(2018,1,1), 60*60, 24)\n",
"dfch = pd.DataFrame(AH)\n",
"dfch.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can perform a wide variety of data transformation tasks directly inside Circonus using CAQL expressions.\n",
"This speeds up the computation even further.\n",
"Another advantage is, that we can leverage CAQL queries for live graphing and alerting in the Circonus UI.\n",
"\n",
"In this example we compute how many requests were serviced above certain latencty thresholds."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f60035c3b70>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x7f60035804a8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"B = circ.caql('''\n",
"\n",
"search:metric:histogram(\"api`GET`/getState\") | histogram:count_above(0,10,50,100,500,1000)\n",
"\n",
"''', datetime(2018,1,1), 60*5, 24*20)\n",
"dfc2 = pd.DataFrame(B)\n",
"dfc2['time']=pd.to_datetime(dfc2['time'],unit='s')\n",
"dfc2.set_index('time', inplace=True)\n",
"dfc2.plot(figsize=(20,5), colormap=\"gist_heat\",legend=False, lw=.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"Getting Circonus-data into python has never been easier.\n",
"We hope that this blog post allows you to get started with the new data fetching capabilities.\n",
"If you run into any problems or have some suggestions, feel free to open an issue on [GitHUb](github.com/circonus-labs/python-circonusapi),\n",
"or get in touch on our [Slack channel](http://slack.s.circonus.com/).\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment