Skip to content

Instantly share code, notes, and snippets.

@gemeinl
Last active May 11, 2020 11:59
Show Gist options
  • Save gemeinl/1220413cb4418a94c48b71c5014bb390 to your computer and use it in GitHub Desktop.
Save gemeinl/1220413cb4418a94c48b71c5014bb390 to your computer and use it in GitHub Desktop.
braindecode datasets examples
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Load data using mne:__"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import mne\n",
"\n",
"# 5,6,7,10,13,14 are codes for executed and imagined hands/feet\n",
"subject_id = 22\n",
"event_codes = [5,6,9,10,13,14]\n",
"#event_codes = [3,4,5,6,7,8,9,10,11,12,13,14]\n",
"\n",
"# This will download the files if you don't have them yet,\n",
"# and then return the paths to the files.\n",
"physionet_paths = mne.datasets.eegbci.load_data(subject_id, event_codes)\n",
"\n",
"# Load each of the files\n",
"parts = [mne.io.read_raw_edf(path, preload=True,stim_channel='auto', verbose='WARNING')\n",
" for path in physionet_paths]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Transform into braindecode format:__"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/gemeinl/anaconda3/envs/braindecode_v2/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.metrics.scorer module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.\n",
" warnings.warn(message, FutureWarning)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"from braindecode.datasets.base import BaseDataset, BaseConcatDataset\n",
"from braindecode.datautil.windowers import create_windows_from_events\n",
"\n",
"base_datasets = [BaseDataset(raw, pd.Series({\"subject\": subject_id}))for raw in parts]\n",
"base_datasets = BaseConcatDataset(base_datasets)\n",
"windows_datasets = create_windows_from_events(\n",
" base_datasets,\n",
" trial_start_offset_samples=0, \n",
" trial_stop_offset_samples=0,\n",
" supercrop_size_samples=500,\n",
" supercrop_stride_samples=500,\n",
" drop_samples=False\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"__Two things we could probably improve here:__\n",
"- BaseDataset could accept a dict instead of a pandas.Series -> would remove the import for the user\n",
"- This dict or pandas.Series could be optional -> casting to BaseDataset is easier. downside, pandas.Series information cannot be used for splitting the dataset (would need to add a check. splitting would still be possible based on ids)\n",
"- Probably one of the steps could be removed (BaseDataset or BaseConcatDataset), but I would not do this now"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A BaseDataset inherits from pytorch.Dataset."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from braindecode.datasets.base import BaseDataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Upon creation a BaseDataset takes a mne.Raw holding raw EEG data and a pandas.Series holding additional information, such as subject id, gender, age, whether it belongs to training or evaluation set, the target (only if it is given for the entire signal of the mne.Raw), etc... "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"list_of_base_ds = []\n",
"for raw_i, raw in enumerate(parts):\n",
" subset = np.random.choice([\"train\", \"eval\"], p=[.8, .2]) \n",
" target = event_codes[raw_i]\n",
" df = pd.Series({\"subset\": subset, \"subject\": subject_id, \"trial\": raw_i, \"target\": target})\n",
" base_ds = BaseDataset(raw, df, target_name=\"target\")\n",
" list_of_base_ds.append(base_ds)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"subset train\n",
"subject 22\n",
"trial 0\n",
"target 5\n",
"dtype: object"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list_of_base_ds[0].description"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A BaseDataset produces tuples of x and y, where x is a single time point of the EEG in mne.Raw and y is the target (if known, else None)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(64, 1) 14\n"
]
}
],
"source": [
"for x, y in base_ds:\n",
" break\n",
"print(x.shape, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A BaseConcatDataset inherits from pytorch.ConcatDataset."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"from braindecode.datasets.base import BaseConcatDataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Upon creation a BaseConcatDataset takes a list of BaseDatasets (or a list of WindowsDatasets)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"concat_base_ds = BaseConcatDataset(list_of_base_ds)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A BaseConcatDataset produces tuples of x and y, where x is a single time point of the EEGs in the mne.Raws and y is the target (if known, else None)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(64, 1) 5\n"
]
}
],
"source": [
"for x, y in concat_base_ds:\n",
" break\n",
"print(x.shape, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The BaseConcatDataset concatenates all description pandas.Series of all BaseDatasets into a single pandas.DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>subset</th>\n",
" <th>subject</th>\n",
" <th>trial</th>\n",
" <th>target</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>train</td>\n",
" <td>22</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>train</td>\n",
" <td>22</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>eval</td>\n",
" <td>22</td>\n",
" <td>2</td>\n",
" <td>9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>eval</td>\n",
" <td>22</td>\n",
" <td>3</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>train</td>\n",
" <td>22</td>\n",
" <td>4</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>eval</td>\n",
" <td>22</td>\n",
" <td>5</td>\n",
" <td>14</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" subset subject trial target\n",
"0 train 22 0 5\n",
"1 train 22 1 6\n",
"2 eval 22 2 9\n",
"3 eval 22 3 10\n",
"4 train 22 4 13\n",
"5 eval 22 5 14"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"concat_base_ds.description"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The pandas.DataFrame can be used to split the data, e.g. into training and evaluation set, by subject, etc..."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'eval': <braindecode.datasets.base.BaseConcatDataset at 0x7ff448fc6110>,\n",
" 'train': <braindecode.datasets.base.BaseConcatDataset at 0x7ff448fc6b50>}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"concat_base_ds.split(\"subset\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, predefined ids can be used to split the data."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: <braindecode.datasets.base.BaseConcatDataset at 0x7ff448d79950>,\n",
" 1: <braindecode.datasets.base.BaseConcatDataset at 0x7ff448d79dd0>}"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"concat_base_ds.split(split_ids=[[0,2,3,4], [1, 5]])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"from braindecode.datautil.windowers import create_windows_from_events"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A windowing function takes a BaseConcatDataset of BaseDatasets and transforms it into a BaseConcatDataset of WindowsDatasets. (So, internally transform the mne.Raws into mne.Epochs wrt. the given parameters.) It also inherits the pandas.DataFrame holding additional description."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n",
"Used Annotations descriptions: ['T0', 'T1', 'T2']\n",
"60 matching events found\n",
"No baseline correction applied\n",
"Adding metadata with 4 columns\n",
"0 projection items activated\n",
"Loading data for 60 events and 500 original time points ...\n",
"0 bad epochs dropped\n"
]
}
],
"source": [
"concat_windows_ds = create_windows_from_events(\n",
" concat_base_ds,\n",
" trial_start_offset_samples=0, \n",
" trial_stop_offset_samples=0,\n",
" supercrop_size_samples=500,\n",
" supercrop_stride_samples=500,\n",
" drop_samples=False\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A WindowsDataset inherits from BaseDataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Upon creation a WindowsDataset takes a mne.Epochs holding supercrops of EEG data and a pandas.Series holding additional information, such as subject id, gender, age, whether it belongs to training or evaluation set, the target (only if it is given for the entire signal of the mne.Raw), etc... "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"subset train\n",
"subject 22\n",
"trial 0\n",
"target 5\n",
"dtype: object"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"concat_windows_ds.datasets[0].description"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A WindowsDataset produces tuples of x, y and ind, where x is a supercrop, y is the target of that supercrop, and ind is a three-tuple of id of the supercrop within a trial, the start sample (inclusive) of the supercrop, and the stop sample (exclusive) of the supercrop."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading data for 1 events and 500 original time points ...\n",
"(64, 500) 0 [0, 0, 500]\n"
]
}
],
"source": [
"for x, y, ind in concat_windows_ds.datasets[0]:\n",
" break\n",
"print(x.shape, y, ind)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A BaseConcatDataset of WindowsDatasets produces tuples of x, y and ind, where x is a supercrop, y is the target of that supercrop, and ind is a three-tuple of id of the supercrop within a trial, the start sample (inclusive) of the supercrop, and the stop sample (exclusive) of the supercrop."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading data for 1 events and 500 original time points ...\n",
"(64, 500) 0 [0, 0, 500]\n"
]
}
],
"source": [
"for x, y, ind in concat_windows_ds:\n",
" break\n",
"print(x.shape, y, ind)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The pandas.DataFrame can be used to split the data, e.g. into training and evaluation set, by subject, etc..."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: <braindecode.datasets.base.BaseConcatDataset at 0x7ff44c759290>,\n",
" 1: <braindecode.datasets.base.BaseConcatDataset at 0x7ff448d19610>,\n",
" 2: <braindecode.datasets.base.BaseConcatDataset at 0x7ff448ce5c10>,\n",
" 3: <braindecode.datasets.base.BaseConcatDataset at 0x7ff448ce5890>,\n",
" 4: <braindecode.datasets.base.BaseConcatDataset at 0x7ff448ce5a90>,\n",
" 5: <braindecode.datasets.base.BaseConcatDataset at 0x7ff448ce5d10>}"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"concat_windows_ds.split(\"trial\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, predefined ids can be used to split the data."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: <braindecode.datasets.base.BaseConcatDataset at 0x7ff448ceb3d0>,\n",
" 1: <braindecode.datasets.base.BaseConcatDataset at 0x7ff448ce5f50>}"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"concat_windows_ds.split(split_ids=[[1,2,3], [0, 4, 5]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Skorch requires a pytorch.Dataset as input, whereas braindecode requires that it produces x, y, ind. BaseConcatDataset inherits from pytorch.ConcatDataset and WindowsDatasets return x, y, ind."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "braindecode_v2",
"language": "python",
"name": "braindecode_v2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment