pkipsy/data-processing-with-pandas.ipynb

## data-processing-with-pandas.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quick Data Processing with Pandas\n",
    "## December 2015\n",
    "\n",
    "Many experiments generate unwieldy data files that need to be processed prior to analysis. This simple demo script shows how tools from the [Pandas](http://pandas.pydata.org/) library can be used to quickly and efficiently process experimental data generated with [PsychoPy](http://www.psychopy.org/).\n",
    "\n",
    "In this example, PsychoPy has generated two sets of files for each subject: One, logging information about the stimuli presented in the experiment, and another, logging the subject's response and reaction time. Our goal is to select the information we need for analysis in each file, and correctly concatenate them.\n",
    "\n",
    "**Step 1**: Import the necessary tools."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#import to csv\n",
    "from pandas import DataFrame, read_csv, concat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Step 2**: Extract information about the experiment.\n",
    "\n",
    "In our case, we want to read in just the first 16 rows of each experimental file, and import the information from a subset of columns. We'll then combine this information into a single data-frame, and sort the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#specify the subject_IDs\n",
    "subject_IDs = [1, 2, 3, etc...]\n",
    "\n",
    "#read in the first 16 rows of all experiment files, importing info from selected columns\n",
    "all_exp = []\n",
    "for i in subject_IDs:\n",
    "    #specify file path\n",
    "    Location = r'[folder_name]'+str(i)+'.csv'\n",
    "    input = read_csv(Location)\n",
    "    sub_num = [i]*16\n",
    "    experiment_info = zip(sub_num, input['condition'],input['gender'],input['name'], input['face'])\n",
    "    experiment_info = experiment_info[:16]\n",
    "    for tup in experiment_info:\n",
    "        all_exp.append(tup)\n",
    "\n",
    "#concatenate info into data frame, with named columns\n",
    "exp_df = DataFrame(data = all_exp, columns=['Subject', 'Condition','Gender', 'Name', 'Face'])\n",
    "#sort by subject number and face\n",
    "exp_df = exp_df.sort(['Subject', 'Face'], ascending=[1, 1])\n",
    "#replace index to reflect sorting\n",
    "ind = []\n",
    "for i in exp_df.index:\n",
    "    ind.append(i)\n",
    "\n",
    "ind.sort()\n",
    "\n",
    "exp_df.index = ind"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Step 3**: Extract information about our subjects.\n",
    "\n",
    "Here again, we want to read in just the first 16 rows of each experimental file, and import the information from a subset of columns. We'll then combine this information into a single data-frame, and sort the columns.\n",
    "\n",
    "**An important note**: We want to make sure that the sort method we use is the *same* as that used in the experiment files. That way, when we combine our subject data with the experiment data, the rows will correctly line up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#cycle through every subject file, importing info from selected columns\n",
    "all_sub = []\n",
    "for i in subject_IDs:\n",
    "    Location = r'[folder_name]/subject-'+str(i)+'.csv'\n",
    "    input = read_csv(Location)\n",
    "    subject_num = [i]*16\n",
    "    subject_info = zip(subject_num, input['face'],input['response'],input['response_time'])\n",
    "    for tup in subject_info:\n",
    "        all_sub.append(tup)\n",
    "\n",
    "#concatenate into data frame\n",
    "sub_df = DataFrame(data = all_sub, columns=['Subject2', 'Face2', 'Response', 'RT'])\n",
    "#sort by subject number and face\n",
    "sub_df = sub_df.sort(['Subject2', 'Face2'], ascending=[1, 1])\n",
    "\n",
    "#replace index to reflect sorting\n",
    "ind2 = []\n",
    "for i in sub_df.index:\n",
    "    ind2.append(i)\n",
    "\n",
    "ind2.sort()\n",
    "\n",
    "sub_df.index = ind2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Step 4**: Combine subject & experiment information and export to a single .csv file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "###concatenate experiment + subject files ###\n",
    "pieces = [exp_df, sub_df]\n",
    "together = concat(pieces, axis=1, join='outer')\n",
    "\n",
    "#check concatenation, then:\n",
    "del together['Subject2']\n",
    "del together['Face2']\n",
    "\n",
    "#export to a new csv file\n",
    "together.to_csv('New_Name.csv',index=False,header=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have both pieces of information, we can concatenate them into a single file and export them for analysis. This is much faster than manually extracting this information and combining it in Excel."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Quick Data Processing with Pandas\n",
	"## December 2015\n",
	"\n",
	"Many experiments generate unwieldy data files that need to be processed prior to analysis. This simple demo script shows how tools from the [Pandas](http://pandas.pydata.org/) library can be used to quickly and efficiently process experimental data generated with [PsychoPy](http://www.psychopy.org/).\n",
	"\n",
	"In this example, PsychoPy has generated two sets of files for each subject: One, logging information about the stimuli presented in the experiment, and another, logging the subject's response and reaction time. Our goal is to select the information we need for analysis in each file, and correctly concatenate them.\n",
	"\n",
	"Step 1: Import the necessary tools."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"#import to csv\n",
	"from pandas import DataFrame, read_csv, concat"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Step 2: Extract information about the experiment.\n",
	"\n",
	"In our case, we want to read in just the first 16 rows of each experimental file, and import the information from a subset of columns. We'll then combine this information into a single data-frame, and sort the columns."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"#specify the subject_IDs\n",
	"subject_IDs = [1, 2, 3, etc...]\n",
	"\n",
	"#read in the first 16 rows of all experiment files, importing info from selected columns\n",
	"all_exp = []\n",
	"for i in subject_IDs:\n",
	" #specify file path\n",
	" Location = r'[folder_name]'+str(i)+'.csv'\n",
	" input = read_csv(Location)\n",
	" sub_num = [i]*16\n",
	" experiment_info = zip(sub_num, input['condition'],input['gender'],input['name'], input['face'])\n",
	" experiment_info = experiment_info[:16]\n",
	" for tup in experiment_info:\n",
	" all_exp.append(tup)\n",
	"\n",
	"#concatenate info into data frame, with named columns\n",
	"exp_df = DataFrame(data = all_exp, columns=['Subject', 'Condition','Gender', 'Name', 'Face'])\n",
	"#sort by subject number and face\n",
	"exp_df = exp_df.sort(['Subject', 'Face'], ascending=[1, 1])\n",
	"#replace index to reflect sorting\n",
	"ind = []\n",
	"for i in exp_df.index:\n",
	" ind.append(i)\n",
	"\n",
	"ind.sort()\n",
	"\n",
	"exp_df.index = ind"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Step 3: Extract information about our subjects.\n",
	"\n",
	"Here again, we want to read in just the first 16 rows of each experimental file, and import the information from a subset of columns. We'll then combine this information into a single data-frame, and sort the columns.\n",
	"\n",
	"An important note: We want to make sure that the sort method we use is the same as that used in the experiment files. That way, when we combine our subject data with the experiment data, the rows will correctly line up."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"#cycle through every subject file, importing info from selected columns\n",
	"all_sub = []\n",
	"for i in subject_IDs:\n",
	" Location = r'[folder_name]/subject-'+str(i)+'.csv'\n",
	" input = read_csv(Location)\n",
	" subject_num = [i]*16\n",
	" subject_info = zip(subject_num, input['face'],input['response'],input['response_time'])\n",
	" for tup in subject_info:\n",
	" all_sub.append(tup)\n",
	"\n",
	"#concatenate into data frame\n",
	"sub_df = DataFrame(data = all_sub, columns=['Subject2', 'Face2', 'Response', 'RT'])\n",
	"#sort by subject number and face\n",
	"sub_df = sub_df.sort(['Subject2', 'Face2'], ascending=[1, 1])\n",
	"\n",
	"#replace index to reflect sorting\n",
	"ind2 = []\n",
	"for i in sub_df.index:\n",
	" ind2.append(i)\n",
	"\n",
	"ind2.sort()\n",
	"\n",
	"sub_df.index = ind2"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Step 4: Combine subject & experiment information and export to a single .csv file."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"###concatenate experiment + subject files ###\n",
	"pieces = [exp_df, sub_df]\n",
	"together = concat(pieces, axis=1, join='outer')\n",
	"\n",
	"#check concatenation, then:\n",
	"del together['Subject2']\n",
	"del together['Face2']\n",
	"\n",
	"#export to a new csv file\n",
	"together.to_csv('New_Name.csv',index=False,header=True)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now that we have both pieces of information, we can concatenate them into a single file and export them for analysis. This is much faster than manually extracting this information and combining it in Excel."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python [default]",
	"language": "python",
	"name": "python2"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.12"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 1
	}