Skip to content

Instantly share code, notes, and snippets.

@Immiora
Last active March 18, 2019 00:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Immiora/ae83e129466f2d496a790dff2666b35c to your computer and use it in GitHub Desktop.
Save Immiora/ae83e129466f2d496a790dff2666b35c to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas basics\n",
"\n",
"* Series and Dataframe\n",
"* Indexing and data selection\n",
"* Loading and writing data\n",
"* Basic visualization \n",
"* Basic stats"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [],
"source": [
"# jupyter magics\n",
"%matplotlib inline\n",
"%config IPCompleter.greedy=True\n",
"\n",
"# multiple output\n",
"from IPython.core.interactiveshell import InteractiveShell\n",
"InteractiveShell.ast_node_interactivity = \"all\""
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from pandas import Series, DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pandas Series \n",
"Series is a mutable **1D** datatype in Pandas (any numpy datatype). Series uses explicit indices with Series datatype. Indices can be added automatically from 0 to len(data):"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1.76405235 0.40015721 0.97873798 2.2408932 1.86755799]\n",
"RangeIndex(start=0, stop=5, step=1)\n",
"1.8675579901499675\n"
]
}
],
"source": [
"np.random.seed(0)\n",
"mySeries0 = Series(np.random.normal(size=(5,)))\n",
"print(mySeries0.values)\n",
"print(mySeries0.index)\n",
"print(mySeries0[mySeries0.last_valid_index()]) # -1 throws an error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or the index can be manually set:"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"code\n",
"a 1.624345\n",
"b -0.611756\n",
"c -0.528172\n",
"d -1.072969\n",
"e 0.865408\n",
"Name: random_norm, dtype: float64\n",
"\n",
"Values: [ 1.62434536 -0.61175641 -0.52817175 -1.07296862 0.86540763]\n",
"Index: Index(['a', 'b', 'c', 'd', 'e'], dtype='object', name='code')\n",
"Element at index [\"a\"]: 1.6243453636632417\n"
]
}
],
"source": [
"np.random.seed(1)\n",
"mySeries1 = Series(np.random.normal(size=(5,)), index=['a', 'b', 'c', 'd', 'e'])\n",
"mySeries1.name = 'random_norm'\n",
"mySeries1.index.name = 'code'\n",
"print(mySeries1)\n",
"print()\n",
"print('Values: ', mySeries1.values)\n",
"print('Index: ', mySeries1.index)\n",
"print('Element at index [\"a\"]: ' + str(mySeries1['a']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All numpy operations can be applied to Series without losing the data structure. "
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"code\n",
"a 1.0\n",
"b -1.0\n",
"c -1.0\n",
"d -1.0\n",
"e 1.0\n",
"Name: random_norm, dtype: float64"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.sign(mySeries1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Series are basically **ordered key-value pairs** and can be created directly from a Python dictionary."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"s0 15\n",
"s1 22\n",
"s2 19\n",
"s3 32\n",
"Name: subjects, dtype: int64\n"
]
}
],
"source": [
"myDict = {'s0': 15, 's1': 22, 's2': 19, 's3': 32}\n",
"mySeries2 = Series(myDict, name='subjects')\n",
"print(mySeries2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pandas Dataframe\n",
"Dataframe is a mutable **2D** data type (=table). Dataframe is characterized by labels (for each column) and indices (for each row). Dataframe can have mixed data types in different columns.\n",
"\n",
"To display the data use **.head()** method."
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>codes</th>\n",
" <th>age</th>\n",
" <th>handedness</th>\n",
" <th>experiment</th>\n",
" <th>performance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>s0</th>\n",
" <td>sm</td>\n",
" <td>15</td>\n",
" <td>l</td>\n",
" <td>AB</td>\n",
" <td>89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s1</th>\n",
" <td>at</td>\n",
" <td>22</td>\n",
" <td>l</td>\n",
" <td>BA</td>\n",
" <td>78</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s2</th>\n",
" <td>ap</td>\n",
" <td>19</td>\n",
" <td>l</td>\n",
" <td>BA</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s3</th>\n",
" <td>gs</td>\n",
" <td>32</td>\n",
" <td>r</td>\n",
" <td>AB</td>\n",
" <td>89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s4</th>\n",
" <td>rd</td>\n",
" <td>28</td>\n",
" <td>r</td>\n",
" <td>AB</td>\n",
" <td>79</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" codes age handedness experiment performance\n",
"s0 sm 15 l AB 89\n",
"s1 at 22 l BA 78\n",
"s2 ap 19 l BA 95\n",
"s3 gs 32 r AB 89\n",
"s4 rd 28 r AB 79"
]
},
"execution_count": 136,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>codes</th>\n",
" <th>age</th>\n",
" <th>handedness</th>\n",
" <th>experiment</th>\n",
" <th>performance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>s0</th>\n",
" <td>sm</td>\n",
" <td>15</td>\n",
" <td>l</td>\n",
" <td>AB</td>\n",
" <td>89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s1</th>\n",
" <td>at</td>\n",
" <td>22</td>\n",
" <td>l</td>\n",
" <td>BA</td>\n",
" <td>78</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s2</th>\n",
" <td>ap</td>\n",
" <td>19</td>\n",
" <td>l</td>\n",
" <td>BA</td>\n",
" <td>95</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" codes age handedness experiment performance\n",
"s0 sm 15 l AB 89\n",
"s1 at 22 l BA 78\n",
"s2 ap 19 l BA 95"
]
},
"execution_count": 136,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"codes = ['sm', 'at', 'ap', 'gs', 'rd']\n",
"age = [15, 22, 19, 32, 28]\n",
"handedness = ['l', 'l', 'l', 'r', 'r']\n",
"experiment = ['AB', 'BA', 'BA', 'AB', 'AB']\n",
"performance = [89, 78, 95, 89, 79]\n",
"df = DataFrame({'codes': codes, 'age': age, 'handedness': handedness, 'experiment':experiment, 'performance':performance}, \n",
" index=['s0', 's1', 's2', 's3', 's4'])\n",
"df\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dataframe columns and index:"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Columns: Index(['codes', 'age', 'handedness', 'experiment', 'performance'], dtype='object')\n",
"Rows: Index(['s0', 's1', 's2', 's3', 's4'], dtype='object')\n"
]
}
],
"source": [
"print('Columns:', df.columns)\n",
"print('Rows:', df.index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Indexing and data selection\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dataframe data can be indexed by column or by row. \n",
"However, indices can be **label-based** (`.loc`) or **position-based** (`.iloc`).\n",
"\n",
"If you index the entire row/column, the operation **returns a 1D Pandas Series**.\n",
"\n",
"Entire row by label:"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"codes sm\n",
"age 15\n",
"handedness l\n",
"experiment AB\n",
"performance 89\n",
"Name: s0, dtype: object"
]
},
"execution_count": 135,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# row by label\n",
"df.loc['s0']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Entire row by position:"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"codes sm\n",
"age 15\n",
"handedness l\n",
"experiment AB\n",
"performance 89\n",
"Name: s0, dtype: object"
]
},
"execution_count": 134,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# row by position\n",
"df.iloc[0]\n",
"# print(df.iloc[[0]]) returns a dataframe: using [[]] returns a DataFrame\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Entire column by label:"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"s0 15\n",
"s1 22\n",
"s2 19\n",
"s3 32\n",
"s4 28\n",
"Name: age, dtype: int64"
]
},
"execution_count": 133,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# column by label\n",
"df.loc[:, 'age']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Entire column by position:"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"s0 15\n",
"s1 22\n",
"s2 19\n",
"s3 32\n",
"s4 28\n",
"Name: age, dtype: int64"
]
},
"execution_count": 132,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# column by position\n",
"df.iloc[:, 1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Column data can also be accessed by using a dictionary-like or object-like syntax:\n",
"\n",
"**Note**: row data was previously acccessible by using `.ix[index]`, but it has now been deprecated in favor of `.loc` and `.iloc`"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Entire column [\"age\"]:\n"
]
},
{
"data": {
"text/plain": [
"s0 15\n",
"s1 22\n",
"s2 19\n",
"s3 32\n",
"s4 28\n",
"Name: age, dtype: int64"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"s0 15\n",
"s1 22\n",
"s2 19\n",
"s3 32\n",
"s4 28\n",
"Name: age, dtype: int64"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('Entire column [\"age\"]:')\n",
"df['age']\n",
"df.age"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Accessing single values is done by specifying both indices in `.loc` or `.iloc`:"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"15\n",
"15\n"
]
}
],
"source": [
"print(df.loc['s0']['age'])\n",
"print(df.iloc[0][1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Often it is useful to select data based on a condition (**boolean indexing**):"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>codes</th>\n",
" <th>age</th>\n",
" <th>handedness</th>\n",
" <th>experiment</th>\n",
" <th>performance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>s1</th>\n",
" <td>at</td>\n",
" <td>22</td>\n",
" <td>l</td>\n",
" <td>BA</td>\n",
" <td>78</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s2</th>\n",
" <td>ap</td>\n",
" <td>19</td>\n",
" <td>l</td>\n",
" <td>BA</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s3</th>\n",
" <td>gs</td>\n",
" <td>32</td>\n",
" <td>r</td>\n",
" <td>AB</td>\n",
" <td>89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s4</th>\n",
" <td>rd</td>\n",
" <td>28</td>\n",
" <td>r</td>\n",
" <td>AB</td>\n",
" <td>79</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" codes age handedness experiment performance\n",
"s1 at 22 l BA 78\n",
"s2 ap 19 l BA 95\n",
"s3 gs 32 r AB 89\n",
"s4 rd 28 r AB 79"
]
},
"execution_count": 145,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>codes</th>\n",
" <th>age</th>\n",
" <th>handedness</th>\n",
" <th>experiment</th>\n",
" <th>performance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>s2</th>\n",
" <td>ap</td>\n",
" <td>19</td>\n",
" <td>l</td>\n",
" <td>BA</td>\n",
" <td>95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>s3</th>\n",
" <td>gs</td>\n",
" <td>32</td>\n",
" <td>r</td>\n",
" <td>AB</td>\n",
" <td>89</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" codes age handedness experiment performance\n",
"s2 ap 19 l BA 95\n",
"s3 gs 32 r AB 89"
]
},
"execution_count": 145,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>codes</th>\n",
" <th>age</th>\n",
" <th>handedness</th>\n",
" <th>experiment</th>\n",
" <th>performance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>s0</th>\n",
" <td>sm</td>\n",
" <td>15</td>\n",
" <td>l</td>\n",
" <td>AB</td>\n",
" <td>89</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" codes age handedness experiment performance\n",
"s0 sm 15 l AB 89"
]
},
"execution_count": 145,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['age']>18] # dataframe\n",
"\n",
"df[(df['age']>18) & (df['performance']>80)] # dataframe\n",
"\n",
"df[((df['age']>30) & (df['performance']<80)) |((df['age']<18) & (df['performance']>80))] # dataframe"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading and writing data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Read csv into a dataframe with `pd.ead_csv()`\n",
"\n",
"`pd.read_csv(\n",
" ['filepath_or_buffer', \"sep=','\", 'delimiter=None', \"header='infer'\", 'names=None', 'index_col=None', 'usecols=None', 'squeeze=False', 'prefix=None', 'mangle_dupe_cols=True', 'dtype=None', 'engine=None', 'converters=None', 'true_values=None', 'false_values=None', 'skipinitialspace=False', 'skiprows=None', 'skipfooter=0', 'nrows=None', 'na_values=None', 'keep_default_na=True', 'na_filter=True', 'verbose=False', 'skip_blank_lines=True', 'parse_dates=False', 'infer_datetime_format=False', 'keep_date_col=False', 'date_parser=None', 'dayfirst=False', 'iterator=False', 'chunksize=None', \"compression='infer'\", 'thousands=None', \"decimal=b'.'\", 'lineterminator=None', 'quotechar=\\'\"\\'', 'quoting=0', 'doublequote=True', 'escapechar=None', 'comment=None', 'encoding=None', 'dialect=None', 'tupleize_cols=None', 'error_bad_lines=True', 'warn_bad_lines=True', 'delim_whitespace=False', 'low_memory=True', 'memory_map=False', 'float_precision=None']`\n",
" \n",
"`pd.ead_csv()` can load a local `.csv` file or the one from the specified URL.\n",
"\n",
"**Some available csv datasets:**\n",
"\n",
"* Seaborn has a number of available sample datasets: [here](https://github.com/mwaskom/seaborn-data)\n",
"* Some sample datasets are available from R package: [here](https://vincentarelbundock.github.io/Rdatasets/datasets.html)\n",
"* Kaggle is a great source of datasets for solving machine learning problems: [here](https://www.kaggle.com/datasets)"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>subject</th>\n",
" <th>timepoint</th>\n",
" <th>event</th>\n",
" <th>region</th>\n",
" <th>signal</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>s13</td>\n",
" <td>18</td>\n",
" <td>stim</td>\n",
" <td>parietal</td>\n",
" <td>-0.017552</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>s5</td>\n",
" <td>14</td>\n",
" <td>stim</td>\n",
" <td>parietal</td>\n",
" <td>-0.080883</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>s12</td>\n",
" <td>18</td>\n",
" <td>stim</td>\n",
" <td>parietal</td>\n",
" <td>-0.081033</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>s11</td>\n",
" <td>18</td>\n",
" <td>stim</td>\n",
" <td>parietal</td>\n",
" <td>-0.046134</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>s10</td>\n",
" <td>18</td>\n",
" <td>stim</td>\n",
" <td>parietal</td>\n",
" <td>-0.037970</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" subject timepoint event region signal\n",
"0 s13 18 stim parietal -0.017552\n",
"1 s5 14 stim parietal -0.080883\n",
"2 s12 18 stim parietal -0.081033\n",
"3 s11 18 stim parietal -0.046134\n",
"4 s10 18 stim parietal -0.037970"
]
},
"execution_count": 188,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/fmri.csv')\n",
"csv_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['subject', 'timepoint', 'event', 'region', 'signal'], dtype='object')"
]
},
"execution_count": 156,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df.columns"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {},
"outputs": [],
"source": [
"csv_df_subset10 = csv_df[:10]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Write dataframe into `.csv`, `.txt`, `.pickle` or `.xls`\n",
"\n",
"`pd.to_csv(\n",
" ['path_or_buf=None', \"sep=','\", \"na_rep=''\", 'float_format=None', 'columns=None', 'header=True', 'index=True', 'index_label=None', \"mode='w'\", 'encoding=None', \"compression='infer'\", 'quoting=None', 'quotechar=\\'\"\\'', 'line_terminator=None', 'chunksize=None', 'tupleize_cols=None', 'date_format=None', 'doublequote=True', 'escapechar=None', \"decimal='.'\"]`"
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {},
"outputs": [],
"source": [
"csv_df_subset10.to_csv('./temp_fmri_subset10.csv')\n",
"# csv_df_subset10.to_csv('./temp_fmri_subset10.txt')\n",
"# csv_df_subset10.to_excel('./temp_fmri_subset10.xls')\n",
"# csv_df_subset10.to_pickle('./temp_fmri_subset10')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic preprocessing and analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Show all data type in the dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"subject object\n",
"timepoint int64\n",
"event object\n",
"region object\n",
"signal float64\n",
"dtype: object"
]
},
"execution_count": 166,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Display the columns in the data:"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['subject', 'timepoint', 'event', 'region', 'signal'], dtype='object')"
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Find **unique values** in `['subject']`, `['event']` and `['region']`:"
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['s13', 's5', 's12', 's11', 's10', 's9', 's8', 's7', 's6', 's4',\n",
" 's3', 's2', 's1', 's0'], dtype=object)"
]
},
"execution_count": 165,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"array(['stim', 'cue'], dtype=object)"
]
},
"execution_count": 165,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"array(['parietal', 'frontal'], dtype=object)"
]
},
"execution_count": 165,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df['subject'].unique()\n",
"csv_df['event'].unique()\n",
"csv_df['region'].unique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Access data based on coditions in other columns:"
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"13 stim\n",
"83 stim\n",
"138 stim\n",
"210 stim\n",
"284 stim\n",
"350 stim\n",
"420 stim\n",
"489 stim\n",
"574 cue\n",
"649 cue\n",
"720 cue\n",
"782 cue\n",
"853 cue\n",
"923 cue\n",
"979 cue\n",
"1058 cue\n",
"Name: event, dtype: object"
]
},
"execution_count": 181,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df['event'][csv_df['subject']=='s1'][::5] # subsample output every 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Categorical values can easily ba mapped to dummy variables using `.map()`. \n",
"#### You can also use `.apply()` and a custom-made function to define a transformation to apply to a column of a dataframe"
]
},
{
"cell_type": "code",
"execution_count": 189,
"metadata": {},
"outputs": [],
"source": [
"csv_df['event'] = csv_df['event'].map({'cue': 0, 'stim': 1})"
]
},
{
"cell_type": "code",
"execution_count": 193,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"75 1\n",
"150 1\n",
"225 1\n",
"300 1\n",
"375 1\n",
"450 1\n",
"525 1\n",
"600 0\n",
"675 0\n",
"750 0\n",
"825 0\n",
"900 0\n",
"975 0\n",
"1050 0\n",
"Name: event, dtype: int64"
]
},
"execution_count": 193,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df['event'][::75] # subsample output every 75"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Subselect data and sort\n",
"\n",
"* Below we want to select all fmri `signal` data from `subject == s1` during `event == 1` (stim) in `region == frontal`. \n",
"* We will ouput the signal values together with the corresponding `timepoint` values.\n",
"* Finally we will sort the data by the `timepoint` value."
]
},
{
"cell_type": "code",
"execution_count": 207,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>timepoint</th>\n",
" <th>signal</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>284</th>\n",
" <td>0</td>\n",
" <td>-0.046049</td>\n",
" </tr>\n",
" <tr>\n",
" <th>281</th>\n",
" <td>1</td>\n",
" <td>-0.060273</td>\n",
" </tr>\n",
" <tr>\n",
" <th>295</th>\n",
" <td>2</td>\n",
" <td>-0.037520</td>\n",
" </tr>\n",
" <tr>\n",
" <th>309</th>\n",
" <td>3</td>\n",
" <td>0.057598</td>\n",
" </tr>\n",
" <tr>\n",
" <th>323</th>\n",
" <td>4</td>\n",
" <td>0.202123</td>\n",
" </tr>\n",
" <tr>\n",
" <th>336</th>\n",
" <td>5</td>\n",
" <td>0.315860</td>\n",
" </tr>\n",
" <tr>\n",
" <th>350</th>\n",
" <td>6</td>\n",
" <td>0.321335</td>\n",
" </tr>\n",
" <tr>\n",
" <th>364</th>\n",
" <td>7</td>\n",
" <td>0.204943</td>\n",
" </tr>\n",
" <tr>\n",
" <th>378</th>\n",
" <td>8</td>\n",
" <td>0.036685</td>\n",
" </tr>\n",
" <tr>\n",
" <th>392</th>\n",
" <td>9</td>\n",
" <td>-0.092768</td>\n",
" </tr>\n",
" <tr>\n",
" <th>406</th>\n",
" <td>10</td>\n",
" <td>-0.143302</td>\n",
" </tr>\n",
" <tr>\n",
" <th>420</th>\n",
" <td>11</td>\n",
" <td>-0.133647</td>\n",
" </tr>\n",
" <tr>\n",
" <th>434</th>\n",
" <td>12</td>\n",
" <td>-0.093411</td>\n",
" </tr>\n",
" <tr>\n",
" <th>449</th>\n",
" <td>13</td>\n",
" <td>-0.052714</td>\n",
" </tr>\n",
" <tr>\n",
" <th>463</th>\n",
" <td>14</td>\n",
" <td>-0.021003</td>\n",
" </tr>\n",
" <tr>\n",
" <th>476</th>\n",
" <td>15</td>\n",
" <td>-0.005134</td>\n",
" </tr>\n",
" <tr>\n",
" <th>489</th>\n",
" <td>16</td>\n",
" <td>-0.002719</td>\n",
" </tr>\n",
" <tr>\n",
" <th>520</th>\n",
" <td>17</td>\n",
" <td>-0.015686</td>\n",
" </tr>\n",
" <tr>\n",
" <th>516</th>\n",
" <td>18</td>\n",
" <td>-0.035852</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" timepoint signal\n",
"284 0 -0.046049\n",
"281 1 -0.060273\n",
"295 2 -0.037520\n",
"309 3 0.057598\n",
"323 4 0.202123\n",
"336 5 0.315860\n",
"350 6 0.321335\n",
"364 7 0.204943\n",
"378 8 0.036685\n",
"392 9 -0.092768\n",
"406 10 -0.143302\n",
"420 11 -0.133647\n",
"434 12 -0.093411\n",
"449 13 -0.052714\n",
"463 14 -0.021003\n",
"476 15 -0.005134\n",
"489 16 -0.002719\n",
"520 17 -0.015686\n",
"516 18 -0.035852"
]
},
"execution_count": 207,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df[['timepoint', 'signal']][(csv_df['subject']=='s1') & (csv_df['event']==1) & (csv_df['region']=='frontal')].sort_values(by='timepoint')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### This output can be saved as a `np.array` and be processed further on separately:"
]
},
{
"cell_type": "code",
"execution_count": 213,
"metadata": {},
"outputs": [],
"source": [
"data0 = csv_df[['timepoint', 'signal']][(csv_df['subject']=='s1') & (csv_df['event']==1) & (csv_df['region']=='frontal')].sort_values(by='timepoint').values[:, 1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Basic descriptive statistics about the dataframe can ba accessed by calling `df.describe()`. Only valid for numerical dtype columns"
]
},
{
"cell_type": "code",
"execution_count": 223,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>timepoint</th>\n",
" <th>event</th>\n",
" <th>signal</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>1064.000000</td>\n",
" <td>1064.000000</td>\n",
" <td>1064.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>9.000000</td>\n",
" <td>0.500000</td>\n",
" <td>0.003540</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>5.479801</td>\n",
" <td>0.500235</td>\n",
" <td>0.093930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-0.255486</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>4.000000</td>\n",
" <td>0.000000</td>\n",
" <td>-0.046070</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>9.000000</td>\n",
" <td>0.500000</td>\n",
" <td>-0.013653</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>14.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.024293</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>18.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.564985</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" timepoint event signal\n",
"count 1064.000000 1064.000000 1064.000000\n",
"mean 9.000000 0.500000 0.003540\n",
"std 5.479801 0.500235 0.093930\n",
"min 0.000000 0.000000 -0.255486\n",
"25% 4.000000 0.000000 -0.046070\n",
"50% 9.000000 0.500000 -0.013653\n",
"75% 14.000000 1.000000 0.024293\n",
"max 18.000000 1.000000 0.564985"
]
},
"execution_count": 223,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Basic operations can be run on column data as well as the entire dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 245,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"subject s13s5s12s11s10s9s8s7s6s5s4s3s2s1s0s13s12s7s10s...\n",
"timepoint 9576\n",
"event 532\n",
"region parietalparietalparietalparietalparietalpariet...\n",
"signal 3.76631\n",
"dtype: object"
]
},
"execution_count": 245,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"0.0035397685640969675"
]
},
"execution_count": 245,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"3.7663137521991734"
]
},
"execution_count": 245,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>signal</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>signal</th>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" signal\n",
"signal 1.0"
]
},
"execution_count": 245,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csv_df.sum()\n",
"\n",
"csv_df['signal'].mean()\n",
"csv_df['signal'].sum()\n",
"\n",
"csv_df[['signal']][(csv_df['timepoint']==10) & (csv_df['event']==1) & (csv_df['region']=='frontal')].corr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic visualization\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Histofram of data can be plotted using `.hist()`:"
]
},
{
"cell_type": "code",
"execution_count": 247,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD8CAYAAAB0IB+mAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEOZJREFUeJzt3X+s3fVdx/Hne3RsSBkFOu5q23gx65ItNGPbDSMhcbdjMwwM7R8wMejapfEminGGalZ/JMYfiUWD6JJl2ozFsjgLQ5EGmBMLJ3PGIq0wOsDZC9Zx14Zm/KhecNPq2z/O5+LN5bTne2/PL/t5PpKb8/1+vp9zvu/75vC6337v93xvZCaSpDPfm4ZdgCRpMAx8SaqEgS9JlTDwJakSBr4kVcLAl6RKGPiSVAkDX5IqYeBLUiWWDbsAgJUrV+b4+Piwy3jdq6++yrnnnjvsMkaefWrGPjVjn5qb69WBAwe+m5lvb/q8kQj88fFx9u/fP+wyXtdqtZicnBx2GSPPPjVjn5qxT83N9Soi/nUxz2t0SiciDkfEwYh4IiL2l7ELI+KhiDhUHi8o4xERn4mI6Yh4MiLev/hvR5LUa4s5h78hMy/LzImyvh3Ym5nrgL1lHeBjwLryNQV8rlfFSpKW7nR+absR2FWWdwGb5o3fmW37gBURseo09iNJ6oGmgZ/AX0fEgYiYKmNjmXkUoDxeXMZXA8/Pe+5MGZMkDVHTX9pemZlHIuJi4KGI+KdTzI0OY2+46X75wTEFMDY2RqvValhK/83Ozo5UPaPKPjVjn5qxT80ttVeNAj8zj5THYxFxL3A58EJErMrMo+WUzbEyfQZYO+/pa4AjHV5zJ7ATYGJiIkfpt/NeLdCMfWrGPjVjn5pbaq+6ntKJiHMj4ry5ZeBHgW8Ce4DNZdpm4L6yvAf4RLla5wrg+NypH0nS8DQ5wh8D7o2Iuflfysy/iojHgLsjYivwbeCGMv9B4BpgGngN+GTPq5YkLVrXwM/M54D3dhh/Ebiqw3gCN/ekOklSz4zEJ221OOPbHxjavg/vuHZo+5Z0erx5miRVwsCXpEoY+JJUCQNfkiph4EtSJQx8SaqEgS9JlTDwJakSBr4kVcLAl6RKGPiSVAkDX5IqYeBLUiUMfEmqhIEvSZUw8CWpEga+JFXCwJekShj4klQJA1+SKmHgS1IlDHxJqoSBL0mVMPAlqRIGviRVwsCXpEoY+JJUCQNfkiph4EtSJQx8SaqEgS9JlTDwJakSBr4kVaJx4EfEWRHxeETcX9YviYhHI+JQRNwVEWeX8beU9emyfbw/pUuSFmMxR/ifAp6Zt34rcHtmrgNeBraW8a3Ay5n5TuD2Mk+SNGSNAj8i1gDXAp8v6wF8GLinTNkFbCrLG8s6ZftVZb4kaYgiM7tPirgH+B3gPOAXgS3AvnIUT0SsBb6SmZdGxDeBqzNzpmx7FvhgZn53wWtOAVMAY2NjH9i9e3fPvqnTNTs7y/Lly4ddxkkd/M7xoe17/erzX18e9T6NCvvUjH1qbq5XGzZsOJCZE02ft6zbhIj4MeBYZh6IiMm54Q5Ts8G2/xvI3AnsBJiYmMjJycmFU4am1WoxSvUstGX7A0Pb9+GbJl9fHvU+jQr71Ix9am6pveoa+MCVwHURcQ3wVuBtwB8AKyJiWWaeANYAR8r8GWAtMBMRy4DzgZcWXZkkqae6nsPPzF/OzDWZOQ7cCDycmTcBjwDXl2mbgfvK8p6yTtn+cDY5byRJ6qvTuQ7/08AtETENXATcUcbvAC4q47cA20+vRElSLzQ5pfO6zGwBrbL8HHB5hznfA27oQW2SpB7yk7aSVAkDX5IqYeBLUiUMfEmqhIEvSZUw8CWpEga+JFXCwJekShj4klQJA1+SKmHgS1IlDHxJqoSBL0mVMPAlqRIGviRVYlH3w5fG5/093W3rTwzs7+se3nHtQPYjnck8wpekShj4klQJA1+SKmHgS1IlDHxJqoSBL0mVMPAlqRIGviRVwsCXpEoY+JJUCQNfkiph4EtSJQx8SaqEgS9JlTDwJakSBr4kVcLAl6RKdA38iHhrRPxDRHwjIp6KiN8o45dExKMRcSgi7oqIs8v4W8r6dNk+3t9vQZLURJMj/O8DH87M9wKXAVdHxBXArcDtmbkOeBnYWuZvBV7OzHcCt5d5kqQh6xr42TZbVt9cvhL4MHBPGd8FbCrLG8s6ZftVERE9q1iStCSNzuFHxFkR8QRwDHgIeBZ4JTNPlCkzwOqyvBp4HqBsPw5c1MuiJUmLt6zJpMz8b+CyiFgB3Au8u9O08tjpaD4XDkTEFDAFMDY2RqvValLKQMzOzo5UPQttW3+i+6QBGDtncLWM8n+Pbkb9/TQq7FNzS+1Vo8Cfk5mvREQLuAJYERHLylH8GuBImTYDrAVmImIZcD7wUofX2gnsBJiYmMjJyclFF98vrVaLUapnoS3bHxh2CUA77G87uKi30JIdvmlyIPvph1F/P40K+9TcUnvV5Cqdt5cjeyLiHOAjwDPAI8D1Zdpm4L6yvKesU7Y/nJlvOMKXJA1Wk8OzVcCuiDiL9g+IuzPz/oh4GtgdEb8NPA7cUebfAXwxIqZpH9nf2Ie6JUmL1DXwM/NJ4H0dxp8DLu8w/j3ghp5UJ0nqGT9pK0mVMPAlqRIGviRVwsCXpEoY+JJUCQNfkiph4EtSJQx8SaqEgS9JlTDwJakSBr4kVcLAl6RKGPiSVAkDX5IqYeBLUiUMfEmqhIEvSZUw8CWpEga+JFXCwJekShj4klQJA1+SKmHgS1IlDHxJqoSBL0mVMPAlqRIGviRVwsCXpEoY+JJUCQNfkiph4EtSJQx8SaqEgS9Jlega+BGxNiIeiYhnIuKpiPhUGb8wIh6KiEPl8YIyHhHxmYiYjognI+L9/f4mJEndNTnCPwFsy8x3A1cAN0fEe4DtwN7MXAfsLesAHwPWla8p4HM9r1qStGhdAz8zj2bmP5blfweeAVYDG4FdZdouYFNZ3gjcmW37gBURsarnlUuSFmVR5/AjYhx4H/AoMJaZR6H9QwG4uExbDTw/72kzZUySNETLmk6MiOXAnwO/kJn/FhEnndphLDu83hTtUz6MjY3RarWaltJ3s7OzI1XPQtvWnxh2CQCMnTO4Wkb5v0c3o/5+GhX2qbml9qpR4EfEm2mH/Z9m5l+U4RciYlVmHi2nbI6V8Rlg7bynrwGOLHzNzNwJ7ASYmJjIycnJRRffL61Wi1GqZ6Et2x8YdglAO+xvO9j4mOG0HL5pciD76YdRfz+NCvvU3FJ71eQqnQDuAJ7JzN+ft2kPsLksbwbumzf+iXK1zhXA8blTP5Kk4WlyeHYl8FPAwYh4ooz9CrADuDsitgLfBm4o2x4ErgGmgdeAT/a0YknSknQN/Mz8Op3PywNc1WF+AjefZl2SpB7zk7aSVAkDX5IqYeBLUiUMfEmqhIEvSZUw8CWpEga+JFXCwJekShj4klQJA1+SKmHgS1IlDHxJqoSBL0mVMPAlqRIGviRVwsCXpEoY+JJUCQNfkiph4EtSJQx8SaqEgS9JlTDwJakSBr4kVcLAl6RKGPiSVAkDX5IqYeBLUiUMfEmqhIEvSZUw8CWpEga+JFXCwJekShj4klQJA1+SKtE18CPiCxFxLCK+OW/swoh4KCIOlccLynhExGciYjoinoyI9/ezeElSc02O8P8EuHrB2HZgb2auA/aWdYCPAevK1xTwud6UKUk6XV0DPzO/Bry0YHgjsKss7wI2zRu/M9v2ASsiYlWvipUkLV1kZvdJEePA/Zl5aVl/JTNXzNv+cmZeEBH3Azsy8+tlfC/w6czc3+E1p2j/K4CxsbEP7N69uwffTm/Mzs6yfPnyYZdxUge/c3zYJQAwdg688B+D2df61ecPZkd9MOrvp1Fhn5qb69WGDRsOZOZE0+ct63Ed0WGs40+UzNwJ7ASYmJjIycnJHpeydK1Wi1GqZ6Et2x8YdgkAbFt/gtsO9vot1NnhmyYHsp9+GPX306iwT80ttVdLvUrnhblTNeXxWBmfAdbOm7cGOLLEfUiSemipgb8H2FyWNwP3zRv/RLla5wrgeGYePc0aJUk90PXf4xHxZ8AksDIiZoBfB3YAd0fEVuDbwA1l+oPANcA08BrwyT7ULElagq6Bn5k/cZJNV3WYm8DNp1uUJKn3/KStJFXCwJekShj4klQJA1+SKmHgS1IlBvMxyTPU+Ih84lWSmvAIX5IqYeBLUiUMfEmqhOfw9f/CsH5fcnjHtUPZr9QPHuFLUiUMfEmqhIEvSZUw8CWpEga+JFXCwJekShj4klQJA1+SKmHgS1IlDHxJqoS3VpBOoRe3dNi2/gRblvA63tZBveYRviRVwsCXpEoY+JJUCQNfkiph4EtSJQx8SaqEl2VKI8q/8qVe8whfkiph4EtSJQx8SaqEgS9JlTDwJakSfblKJyKuBv4QOAv4fGbu6Md+JPXesK4O2rb+BJND2XM9eh74EXEW8Fngo8AM8FhE7MnMp3u9L+jPm3OpdzeUpFHWjyP8y4HpzHwOICJ2AxuBvgS+pDOHnz3or34E/mrg+XnrM8AH+7AfSeqJYf2ggcH+sOlH4EeHsXzDpIgpYKqszkbEt/pQy5L8PKwEvjvsOkadfWrGPjVTa5/i1iU9ba5XP7SYJ/Uj8GeAtfPW1wBHFk7KzJ3Azj7s/7RFxP7MnBh2HaPOPjVjn5qxT80ttVf9uCzzMWBdRFwSEWcDNwJ7+rAfSdIi9PwIPzNPRMTPAV+lfVnmFzLzqV7vR5K0OH25Dj8zHwQe7MdrD8hInmoaQfapGfvUjH1qbkm9isw3/D5VknQG8tYKklQJAx+IiAsj4qGIOFQeL+gw57KI+PuIeCoinoyIHx9GrcMQEVdHxLciYjoitnfY/paIuKtsfzQixgdf5fA16NMtEfF0ef/sjYhFXVJ3pujWp3nzro+IjIgqr9xp0qeI+Hh5Tz0VEV/q+qKZWf0X8LvA9rK8Hbi1w5x3AevK8g8CR4EVw659AL05C3gW+GHgbOAbwHsWzPlZ4I/K8o3AXcOue0T7tAH4gbL8M/apc5/KvPOArwH7gIlh1z2KfQLWAY8DF5T1i7u9rkf4bRuBXWV5F7Bp4YTM/OfMPFSWjwDHgLcPrMLhef1WGZn5n8DcrTLmm9+/e4CrIqLTB/DOZF37lJmPZOZrZXUf7c+o1KbJ+wngt2gfiH1vkMWNkCZ9+mngs5n5MkBmHuv2ogZ+21hmHgUojxefanJEXE77p+6zA6ht2DrdKmP1yeZk5gngOHDRQKobHU36NN9W4Ct9rWg0de1TRLwPWJuZ9w+ysBHT5P30LuBdEfF3EbGv3KX4lKr5I+YR8TfAOzps+tVFvs4q4IvA5sz8n17UNuKa3Cqj0e00znCNexARPwlMAB/qa0Wj6ZR9iog3AbcDWwZV0Ihq8n5aRvu0ziTtfy3+bURcmpmvnOxFqwn8zPzIybZFxAsRsSozj5ZA7/hPo4h4G/AA8GuZua9PpY6aJrfKmJszExHLgPOBlwZT3shodEuRiPgI7YOMD2Xm9wdU2yjp1qfzgEuBVjkr+A5gT0Rcl5n7B1bl8DX9/25fZv4X8C/lfmTraN/toCNP6bTtATaX5c3AfQsnlNtE3AvcmZlfHmBtw9bkVhnz+3c98HCW3yJVpGufyqmKPwaua3K+9Qx1yj5l5vHMXJmZ45k5Tvt3HbWFPTT7/+4vaV8IQESspH2K57lTvaiB37YD+GhEHKL9h1t2AETERER8vsz5OPAjwJaIeKJ8XTaccgennJOfu1XGM8DdmflURPxmRFxXpt0BXBQR08AttK90qkrDPv0esBz4cnn/VHePqYZ9ql7DPn0VeDEingYeAX4pM1881ev6SVtJqoRH+JJUCQNfkiph4EtSJQx8SaqEgS9JlTDwJakSBr4kVcLAl6RK/C9E979YhEZUQwAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"csv_df['signal'].hist();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Categorical data can be easily plotted using `.value_counts()` (it counts unique values). Below is a histogram over the subjects"
]
},
{
"cell_type": "code",
"execution_count": 173,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEECAYAAAA4Qc+SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEatJREFUeJzt3XuQZHV5xvHvIwsqosLCsG6JZNVsiKYMF6cQCysVQQxqIptEiEZxNVhrqpRomYpi4qUSjYWmEk2ZlLriZTUGQRSXCBLJCqJVig5XxRVXUQFBdlSIGCsq+uaPPshknaF7pk8zOz+/n6quc+33vDOn+5nTp/v0pKqQJK1891nuBiRJ/TDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY1YdW9u7IADDqh169bdm5uUpBXv8ssv/25VTQ1b714N9HXr1jEzM3NvblKSVrwk3xplPU+5SFIjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSI+7VK0UXsu608xe1/jdPf5r1e6q/knu3vvV/1evvyiN0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEUMDPckhSa6ac/tBkpcmWZ3koiQ7uuF+90bDkqT5DQ30qrquqg6rqsOAxwI/As4FTgO2VdV6YFs3LUlaJos95XIs8PWq+hZwArClm78F2NBnY5KkxVlsoD8TOLMbX1NVtwB0wwP7bEyStDgjB3qSvYCnAx9azAaSbEoyk2RmdnZ2sf1Jkka0mCP0pwBXVNWt3fStSdYCdMOd892pqjZX1XRVTU9NTY3XrSRpQYsJ9Gdx9+kWgPOAjd34RmBrX01JkhZvpEBPsjdwHPCRObNPB45LsqNbdnr/7UmSRjXS/xStqh8B++8y73sMPvUiSdoNeKWoJDXCQJekRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1IhR/0n0vknOSfKVJNuTPD7J6iQXJdnRDfebdLOSpIWNeoT+z8CFVfWbwKHAduA0YFtVrQe2ddOSpGUyNNCTPAj4HeBdAFX1k6q6HTgB2NKttgXYMKkmJUnDjXKE/ghgFnhPkiuTnJHkAcCaqroFoBseON+dk2xKMpNkZnZ2trfGJUn/3yiBvgo4AnhbVR0O/A+LOL1SVZurarqqpqemppbYpiRpmFEC/Sbgpqq6rJs+h0HA35pkLUA33DmZFiVJoxga6FX1HeDGJId0s44FvgycB2zs5m0Etk6kQ0nSSFaNuN6pwAeS7AVcDzyfwR+Ds5OcAtwAnDiZFiVJoxgp0KvqKmB6nkXH9tuOJGmpvFJUkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGmGgS1IjRvoXdEm+CdwB/Ay4s6qmk6wGzgLWAd8ETqqq2ybTpiRpmMUcoT+xqg6rqrv+t+hpwLaqWg9s66YlSctknFMuJwBbuvEtwIbx25EkLdWogV7AJ5JcnmRTN29NVd0C0A0PnESDkqTRjHQOHTi6qm5OciBwUZKvjLqB7g/AJoCDDz54CS1KkkYx0hF6Vd3cDXcC5wJHArcmWQvQDXcucN/NVTVdVdNTU1P9dC1J+iVDAz3JA5I88K5x4MnAl4DzgI3dahuBrZNqUpI03CinXNYA5ya5a/1/r6oLk3wBODvJKcANwImTa1OSNMzQQK+q64FD55n/PeDYSTQlSVo8rxSVpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktSIkQM9yR5JrkzysW764UkuS7IjyVlJ9ppcm5KkYRZzhP4SYPuc6TcCb66q9cBtwCl9NiZJWpyRAj3JQcDTgDO66QDHAOd0q2wBNkyiQUnSaEY9Qn8L8HLg5930/sDtVXVnN30T8NCee5MkLcLQQE/y+8DOqrp87ux5Vq0F7r8pyUySmdnZ2SW2KUkaZpQj9KOBpyf5JvBBBqda3gLsm2RVt85BwM3z3bmqNlfVdFVNT01N9dCyJGk+QwO9ql5ZVQdV1TrgmcAnq+rZwMXAM7rVNgJbJ9alJGmocT6H/grgZUm+xuCc+rv6aUmStBSrhq9yt6q6BLikG78eOLL/liRJS+GVopLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEQa6JDXCQJekRhjoktQIA12SGjE00JPcL8nnk1yd5Nokf9vNf3iSy5LsSHJWkr0m364kaSGjHKH/GDimqg4FDgOOT3IU8EbgzVW1HrgNOGVybUqShhka6DXww25yz+5WwDHAOd38LcCGiXQoSRrJSOfQk+yR5CpgJ3AR8HXg9qq6s1vlJuChk2lRkjSKkQK9qn5WVYcBBwFHAo+ab7X57ptkU5KZJDOzs7NL71SSdI8W9SmXqroduAQ4Ctg3yapu0UHAzQvcZ3NVTVfV9NTU1Di9SpLuwSifcplKsm83fn/gScB24GLgGd1qG4Gtk2pSkjTcquGrsBbYkmQPBn8Azq6qjyX5MvDBJK8HrgTeNcE+JUlDDA30qroGOHye+dczOJ8uSdoNeKWoJDXCQJekRhjoktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1IihgZ7kYUkuTrI9ybVJXtLNX53koiQ7uuF+k29XkrSQUY7Q7wT+sqoeBRwFvCjJo4HTgG1VtR7Y1k1LkpbJ0ECvqluq6opu/A5gO/BQ4ARgS7faFmDDpJqUJA23qHPoSdYBhwOXAWuq6hYYhD5w4AL32ZRkJsnM7OzseN1KkhY0cqAn2Qf4MPDSqvrBqPerqs1VNV1V01NTU0vpUZI0gpECPcmeDML8A1X1kW72rUnWdsvXAjsn06IkaRSjfMolwLuA7VX1T3MWnQds7MY3Alv7b0+SNKpVI6xzNHAy8MUkV3Xz/ho4HTg7ySnADcCJk2lRkjSKoYFeVZ8BssDiY/ttR5K0VF4pKkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaYaBLUiMMdElqhIEuSY0w0CWpEaP8k+h3J9mZ5Etz5q1OclGSHd1wv8m2KUkaZpQj9PcCx+8y7zRgW1WtB7Z105KkZTQ00KvqUuD7u8w+AdjSjW8BNvTclyRpkZZ6Dn1NVd0C0A0P7K8lSdJSTPxN0SSbkswkmZmdnZ305iTpV9ZSA/3WJGsBuuHOhVasqs1VNV1V01NTU0vcnCRpmKUG+nnAxm58I7C1n3YkSUs1yscWzwQ+CxyS5KYkpwCnA8cl2QEc101LkpbRqmErVNWzFlh0bM+9SJLG4JWiktQIA12SGmGgS1IjDHRJaoSBLkmNMNAlqREGuiQ1wkCXpEYY6JLUCANdkhphoEtSIwx0SWqEgS5JjTDQJakRBrokNcJAl6RGGOiS1AgDXZIaMVagJzk+yXVJvpbktL6akiQt3pIDPckewL8CTwEeDTwryaP7akyStDjjHKEfCXytqq6vqp8AHwRO6KctSdJijRPoDwVunDN9UzdPkrQMUlVLu2NyIvB7VfWCbvpk4MiqOnWX9TYBm7rJQ4DrFrGZA4DvLqlB6+/Ota1vfesvrv6vVdXUsJVWLb0fbgIeNmf6IODmXVeqqs3A5qVsIMlMVU0vrT3r7661rW9960+m/jinXL4ArE/y8CR7Ac8EzuunLUnSYi35CL2q7kzyYuA/gT2Ad1fVtb11JklalHFOuVBVFwAX9NTLfJZ0qsb6u31t61vf+hOw5DdFJUm7Fy/9l6RGGOiS1AgDXZIaMdabotK4kuxTVT+cQN3VVfX9vutqeSRZw+BK9AJurqpbl7ml3dJucYSe5BFJ3p3k9Un2SfLOJF9K8qEk63qovyrJC5NcmOSaJFcn+XiSP0+yZw/19+jqvy7J0bsse1UP9e+T5M+SnN/1fnmSDyb53XFrd/Ufk+RzSW5MsjnJfnOWfb6PbdyDL49bIMnRSbYnuTbJ45JcBMx0P8/je6j/kSTPSbLPuLUWqP/iJAd047+e5NIktye5LMljJrHNOduezKctkq/2VOewJJ8DLgHeBPwD8Knu8XpED/V/e874nkleleS8JG9IsncP9fdO8vIkf5Xkfkme19V/0yQeT7vFp1ySXAqcCTwYeA7wHuBs4MnAs6vqmDHrnwncDmxhcIUrDK5s3Qisrqo/GbP+GcDewOeBk4FPVdXLumVXVNVYD7wk7wG+BfwX8AzgB8CngVcAW6vqrWPW/wzweuBzwAuA5wNPr6qvJ7myqg4fs/7LFloE/E1VrR6z/ueBU4B9gP8ANlTVZ7on/Fur6uh7LDC8/reBzwLHMNgHZwLnd19KN7Yk11bVb3Xj5wNnVNW53R/sv++h/4V+vwGurqqDxqx/B4Mj57tqwuD58COgqupBY9S+CnhhVV22y/yjgHdU1aFLrd3V+cXzM8k/AvszyJ8NwP5V9dwx65/N4Duv7s/gq0+2M8i2PwAeUlUnj1P/l1TVst+AK+eM37DQsjHqX3cPy77aQ/1r5oyvYvAZ048A9+2p/2t2mf5cN7wvsL2H+lftMv1EYAdwFHBFD/X/F3gd8Np5brf3/PjZvsuyPvq/shs+kMEf7AuAWQZP/Cf3UP+6OeNfuKd9v8T6PwOuB74x53bX9E96qP9W4H3AmjnzvjFu3a7OjntY9rWeHztXAXt24+npd3/VnHrf4e6D6F7q73rbXc6h/zzJbwD7Ansnma6qmSTrGVyFOq7bui8T+3BV/RwGpzGAE4Hbeqi/110jVXUnsCnJa4FPMjhqHNdPkzyyBkfMRwA/6bb14yR9vMRKkgdX1X93dS9O8sfAh4Gxjp47VwAfrarL59nwC3qoP/fU4St3WbYX4yuAqroDeD/w/u6o9yTgNOATY9Y/J8l7gb8Dzk3yUgYHBMcCN4xZGwbhfWxV/VKtJDfOs/6iVNWpSR4LnJnko8C/cPcR+7g+3r1qeR93f7vrw4DnAhf2UP/BSf6QwWPovlX1Uxi8rOjpucWcehdUl+Z915+7oWW/MXjgXsfg5cgTGATJDmAncEIP9dcBZ3X1vjqn9lnAw3uo/2/A8fPMPwX4aQ/1j2HwxP4qg6Oqx3Xzp4A39VD/T4Gj5pl/MPDOHuofAhywwLI1PdR/OrD3PPMfCby8h/qXjltjhG08D7iMwTfw3cHgvYU3AA/uofaLgEMXWHZqjz/DfYC/YHA68OYe6z4FeDuD02kf68af2lPt9+xyW9PNfwiwrYf6ZwD7zDP/kcBnen8c9V1wzB/+JOCB3fhruh14RM/1D2bw1ZWvAc7tuf6Jc/p/FYOjrF7qM3iJdhLwoEnU97bk/Xt4j/Xn7t9XT6D+3P5f3ffjp6v/IGBt9/xaMY/Pu3rfZd9OMhvOBaZ7/zmW+xe5yw99TTd8AnApg/+AdNkKrf/plVZ/gW1utv6y7N+V/vjsvf+G9u3Enru7xccW5/hZN3wa8Paq2ko/50CXo/7bVkr9JKsXuO0PPPVXvf4c9+b+XemPz176b3TfTqI+sPtdWPTtJO8AngS8Mcl96fez8taf3yyDj0Vmzrzqpg+0/i+s1P27kuu7bxdjUi+HlviyZG/gj4D13fRaevhYmPWH1t0BHLzAsht/1euv9P27kuu7bxd32y0uLNLySvIiBu+4Xz3PslNr/AuXVnR9LR/37eIY6PqF7rP6F1bVHUleDRwOvL6qrrC+lpP7djS725uiWl6v7p4wT2DwtQtbgLdZX7sB9+0IDHTNtdLf6b9XPkmgZeG+HYGBrrnueif+JOCCCb7Tv1Lra/m4b0fgOXT9QgZfF3o88MWq2pFkLfCYqhr3u0qaqK/l474djYEuSY3wJYskNcJAl6RGGOiS1AgDXZIaYaBLUiP+D5y0uFwb+uGcAAAAAElFTkSuQmCC\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"csv_df['subject'].value_counts().plot(kind='bar');"
]
},
{
"cell_type": "code",
"execution_count": 241,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"csv_df[['signal']][(csv_df['timepoint']==10) & (csv_df['event']==1) & (csv_df['region']=='frontal')].plot();\n",
"csv_df[['signal']][(csv_df['timepoint']==9) & (csv_df['event']==1) & (csv_df['region']=='frontal')].plot();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 1. Find all data that belongs to `['subject'] == s1`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 2. Turn `['region']` to a dummy variable"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 3. Plot mean of signal of all subjects over 18 time points, during stimulation of parietal region "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 4. Find mean and std of signal over subjects for all timepoints during stimulation vs cue in frontal reigon"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 5. Plot bars for means of signal over all time points during stimulation in parietal and frontal regions for subejct 5"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment