RainerEngelken/Decoding_Encoding_Notebook2.ipynb

## Decoding_Encoding_Notebook2.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summary\n",
    "\n",
    "In this notebook, you will create a simple decoder and encoder using regression. <br>\n",
    "\n",
    "The majority of the notebook is already filled in. The empty cells are where you need to add some code. The text above those cells \n",
    "\n",
    "Before you start, you will need to download the following data file: https://www.dropbox.com/s/jcief15oql3tkll/example_data_m1.pickle?dl=0 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Import Packages\n",
    "\n",
    "Below, we import some standard packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Import standard packages\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "import pickle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Load Data\n",
    "\n",
    "The dataset is a recording of neurons in motor cortex, while a monkey is reaching to targets. The x and y velocities of the movement is recorded over time. All data has been put into 50 ms time bins\n",
    "\n",
    "\"neural_data\" is a matrix of size \"number of time bins\" x \"number of neurons\", where each entry is the firing rate of a given neuron in a given time bin\n",
    "\n",
    "\"vels_binned\" is a matrix of size \"number of time bins\" x 2, where the columns are the x and y velocities."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, in the cell below, enter the folder your data is in. For example, if it is in your current directory: <br>\n",
    "folder=''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that you'll need to change what is commented/uncommented below if you are running python 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# folder='' #ENTER THE FOLDER THAT YOUR DATA IS IN\n",
    "\n",
    "with open(folder+'example_data_m1.pickle','rb') as f:\n",
    "    neural_data,vels_binned=pickle.load(f,encoding='latin1') #If using python 3\n",
    "#     neural_data,vels_binned=pickle.load(f) #If using python 2\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since we are doing decoding first in this notebook, I will call the neural_data \"X\" (the inputs) and the velocities \"Y\" (the outputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X=np.copy(neural_data)\n",
    "y=np.copy(vels_binned)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Preprocess Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3A Split into training / testing sets\n",
    "You will fit the model on the training set, and then check its performance on the testing set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Set what part of data should be part of the training/testing/validation sets\n",
    "training_range=[0, 0.7]\n",
    "testing_range=[0.7, 1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Split Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "num_examples=X.shape[0]\n",
    "\n",
    "#Note that each range has a buffer of\"bins_before\" bins at the beginning, and \"bins_after\" bins at the end\n",
    "#This makes it so that the different sets don't include overlapping neural data\n",
    "training_set=np.arange(np.int(np.round(training_range[0]*num_examples)),np.int(np.round(training_range[1]*num_examples)))\n",
    "testing_set=np.arange(np.int(np.round(testing_range[0]*num_examples)),np.int(np.round(testing_range[1]*num_examples)))\n",
    "\n",
    "#Get training data\n",
    "X_train=X[training_set,:]\n",
    "y_train=y[training_set,:]\n",
    "\n",
    "#Get testing data\n",
    "X_test=X[testing_set,:]\n",
    "y_test=y[testing_set,:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3B. Process Covariates\n",
    "We normalize (z_score) the inputs.\n",
    "Parameters for z-scoring (mean/std.) should be determined on the training set only, and then these z-scoring parameters are also used on the testing set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Z-score \"X\" inputs. \n",
    "X_train_mean=np.nanmean(X_train,axis=0)\n",
    "X_train_std=np.nanstd(X_train,axis=0)\n",
    "X_train=(X_train-X_train_mean)/X_train_std\n",
    "X_test=(X_test-X_train_mean)/X_train_std"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Do decoding\n",
    "\n",
    "We will be decoding the velocities from the neural activity using linear regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#Import regression function\n",
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Start by fitting a regression model to predict the velocities from the neural activity in the same time bin. You can just use the X_train and y_train matrices we already have."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot an example of how well the model fits on the training and testing data, and get the R-squared value on the test data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we will use additional time bins of neural activity in the decoder. We will use the below function, which creates a new X matrix for us."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "###$$ GET_SPIKES_WITH_HISTORY #####\n",
    "def get_spikes_with_history(neural_data,bins_before,bins_after,bins_current=1):\n",
    "    \"\"\"\n",
    "    Function that creates the covariate matrix of neural activity\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    neural_data: a matrix of size \"number of time bins\" x \"number of neurons\"\n",
    "        the number of spikes in each time bin for each neuron\n",
    "    bins_before: integer\n",
    "        How many bins of neural data prior to the output are used for decoding\n",
    "    bins_after: integer\n",
    "        How many bins of neural data after the output are used for decoding\n",
    "    bins_current: 0 or 1, optional, default=1\n",
    "        Whether to use the concurrent time bin of neural data for decoding\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    X_flat: a matrix of size \"number of total time bins\" x \"number of surrounding time bins used for prediction x number of neurons\"\n",
    "        For every time bin, there are the firing rates of all neurons from the specified number of time bins before (and after)\n",
    "    \"\"\"\n",
    "\n",
    "    num_examples=neural_data.shape[0] #Number of total time bins we have neural data for\n",
    "    num_neurons=neural_data.shape[1] #Number of neurons\n",
    "    surrounding_bins=bins_before+bins_after+bins_current #Number of surrounding time bins used for prediction\n",
    "    X=np.empty([num_examples,surrounding_bins,num_neurons]) #Initialize covariate matrix with NaNs\n",
    "    X[:] = np.NaN\n",
    "    #Loop through each time bin, and collect the spikes occurring in surrounding time bins\n",
    "    #Note that the first \"bins_before\" and last \"bins_after\" rows of X will remain filled with NaNs, since they don't get filled in below.\n",
    "    #This is because, for example, we cannot collect 10 time bins of spikes before time bin 8\n",
    "    start_idx=0\n",
    "    for i in range(num_examples-bins_before-bins_after): #The first bins_before and last bins_after bins don't get filled in\n",
    "        end_idx=start_idx+surrounding_bins; #The bins of neural data we will be including are between start_idx and end_idx (which will have length \"surrounding_bins\")\n",
    "        X[i+bins_before,:,:]=neural_data[start_idx:end_idx,:] #Put neural data from surrounding bins in X, starting at row \"bins_before\"\n",
    "        start_idx=start_idx+1\n",
    "        \n",
    "    X_flat=X.reshape(X.shape[0],(X.shape[1]*X.shape[2]))\n",
    "    \n",
    "    return X_flat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use 10 bins of neural activity from each neuron (the 9 previous bins, and the current bin), to predict the current velocity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_train_new=get_spikes_with_history(X_train,9,0,1)\n",
    "X_test_new=get_spikes_with_history(X_test,9,0,1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can't use the first 9 bins (of either the training or testing set), since there aren't enough bins of neural activity preceding them"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_train_new2=X_train_new[9:,:]\n",
    "y_train2=y_train[9:,:]\n",
    "\n",
    "X_test_new2=X_test_new[9:,:]\n",
    "y_test2=y_test[9:,:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the decoder with the matrices above. How much do predictions improve?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Do encoding"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "In this section we will take the opposite approach, we will fit each neuron's activity on the velocities."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Let's first define the independent and the dependent variable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X=np.copy(vels_binned)\n",
    "y=np.copy(neural_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Remember that X now is a matrix of size \"number of time bins\" x 2 (x and y position) and y is a matrix \n",
    "of size \"number of time bins\" x \"number of neurons\".\n",
    "\n",
    "Regress the firing rate of the first neuron using the preceding 500 ms (10 time bins) of velocities in x and y.\n",
    "Consider each step as an observation when fitting the model. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LinearRegression\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the R^2 you have obatined for this particular neuron?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Proceed in the same way for the rest of neurons"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An important question that we should ask ourselves when fitting a model is: was the fit significantly better than\n",
    "chance?\n",
    "\n",
    "There are several ways to answer this question. Here we will use the permutation test on the R^2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, fit the firing rate of neuron 1 using the preceeding 500ms (10 time bins) but shuffle the neuronal \n",
    "activity across observations before fitting the model. Is the R^2 better or worse that without shuffling?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from np.random import permutation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Repeat the process 1000 times and plot the distribution of shuffled R^2 (null-hypothesis) and the original R^2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Is this neuron significantly modulated by the velocity?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the fraction of neurons significantly modulated by the velocity?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Additional Notes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are a couple packages that include lots of nonlinear models for decoding and encoding. <br>\n",
    "Decoding: http://github.com/kordinglab/Neural_decoding <br>\n",
    "Encoding: https://github.com/KordingLab/spykesML <br>\n",
    "\n",
    "*Note that there may be other packages out there, but I'm aware of these since they're from my (Josh's) previous lab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Questions?\n",
    "\n",
    "If you have questions on this notebook, please contact either:<br>\n",
    "Josh: jig2124@columbia.edu <br>\n",
    "Ramon: rn2446@columbia.edu"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [py35]",
   "language": "python",
   "name": "Python [py35]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Summary\n",
	"\n",
	"In this notebook, you will create a simple decoder and encoder using regression. <br>\n",
	"\n",
	"The majority of the notebook is already filled in. The empty cells are where you need to add some code. The text above those cells \n",
	"\n",
	"Before you start, you will need to download the following data file: https://www.dropbox.com/s/jcief15oql3tkll/example_data_m1.pickle?dl=0 "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 1. Import Packages\n",
	"\n",
	"Below, we import some standard packages"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"#Import standard packages\n",
	"import numpy as np\n",
	"import matplotlib.pyplot as plt\n",
	"%matplotlib inline\n",
	"import pickle"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 2. Load Data\n",
	"\n",
	"The dataset is a recording of neurons in motor cortex, while a monkey is reaching to targets. The x and y velocities of the movement is recorded over time. All data has been put into 50 ms time bins\n",
	"\n",
	"\"neural_data\" is a matrix of size \"number of time bins\" x \"number of neurons\", where each entry is the firing rate of a given neuron in a given time bin\n",
	"\n",
	"\"vels_binned\" is a matrix of size \"number of time bins\" x 2, where the columns are the x and y velocities."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"First, in the cell below, enter the folder your data is in. For example, if it is in your current directory: <br>\n",
	"folder=''"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Note that you'll need to change what is commented/uncommented below if you are running python 2"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"# folder='' #ENTER THE FOLDER THAT YOUR DATA IS IN\n",
	"\n",
	"with open(folder+'example_data_m1.pickle','rb') as f:\n",
	" neural_data,vels_binned=pickle.load(f,encoding='latin1') #If using python 3\n",
	"# neural_data,vels_binned=pickle.load(f) #If using python 2\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Since we are doing decoding first in this notebook, I will call the neural_data \"X\" (the inputs) and the velocities \"Y\" (the outputs)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"X=np.copy(neural_data)\n",
	"y=np.copy(vels_binned)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 3. Preprocess Data"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### 3A Split into training / testing sets\n",
	"You will fit the model on the training set, and then check its performance on the testing set."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"#Set what part of data should be part of the training/testing/validation sets\n",
	"training_range=[0, 0.7]\n",
	"testing_range=[0.7, 1]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#### Split Data"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"num_examples=X.shape[0]\n",
	"\n",
	"#Note that each range has a buffer of\"bins_before\" bins at the beginning, and \"bins_after\" bins at the end\n",
	"#This makes it so that the different sets don't include overlapping neural data\n",
	"training_set=np.arange(np.int(np.round(training_range[0]num_examples)),np.int(np.round(training_range[1]num_examples)))\n",
	"testing_set=np.arange(np.int(np.round(testing_range[0]num_examples)),np.int(np.round(testing_range[1]num_examples)))\n",
	"\n",
	"#Get training data\n",
	"X_train=X[training_set,:]\n",
	"y_train=y[training_set,:]\n",
	"\n",
	"#Get testing data\n",
	"X_test=X[testing_set,:]\n",
	"y_test=y[testing_set,:]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### 3B. Process Covariates\n",
	"We normalize (z_score) the inputs.\n",
	"Parameters for z-scoring (mean/std.) should be determined on the training set only, and then these z-scoring parameters are also used on the testing set."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"#Z-score \"X\" inputs. \n",
	"X_train_mean=np.nanmean(X_train,axis=0)\n",
	"X_train_std=np.nanstd(X_train,axis=0)\n",
	"X_train=(X_train-X_train_mean)/X_train_std\n",
	"X_test=(X_test-X_train_mean)/X_train_std"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 4. Do decoding\n",
	"\n",
	"We will be decoding the velocities from the neural activity using linear regression."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"#Import regression function\n",
	"from sklearn.linear_model import LinearRegression"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Start by fitting a regression model to predict the velocities from the neural activity in the same time bin. You can just use the X_train and y_train matrices we already have."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Plot an example of how well the model fits on the training and testing data, and get the R-squared value on the test data"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we will use additional time bins of neural activity in the decoder. We will use the below function, which creates a new X matrix for us."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"###$$ GET_SPIKES_WITH_HISTORY #####\n",
	"def get_spikes_with_history(neural_data,bins_before,bins_after,bins_current=1):\n",
	" \"\"\"\n",
	" Function that creates the covariate matrix of neural activity\n",
	"\n",
	" Parameters\n",
	" ----------\n",
	" neural_data: a matrix of size \"number of time bins\" x \"number of neurons\"\n",
	" the number of spikes in each time bin for each neuron\n",
	" bins_before: integer\n",
	" How many bins of neural data prior to the output are used for decoding\n",
	" bins_after: integer\n",
	" How many bins of neural data after the output are used for decoding\n",
	" bins_current: 0 or 1, optional, default=1\n",
	" Whether to use the concurrent time bin of neural data for decoding\n",
	"\n",
	" Returns\n",
	" -------\n",
	" X_flat: a matrix of size \"number of total time bins\" x \"number of surrounding time bins used for prediction x number of neurons\"\n",
	" For every time bin, there are the firing rates of all neurons from the specified number of time bins before (and after)\n",
	" \"\"\"\n",
	"\n",
	" num_examples=neural_data.shape[0] #Number of total time bins we have neural data for\n",
	" num_neurons=neural_data.shape[1] #Number of neurons\n",
	" surrounding_bins=bins_before+bins_after+bins_current #Number of surrounding time bins used for prediction\n",
	" X=np.empty([num_examples,surrounding_bins,num_neurons]) #Initialize covariate matrix with NaNs\n",
	" X[:] = np.NaN\n",
	" #Loop through each time bin, and collect the spikes occurring in surrounding time bins\n",
	" #Note that the first \"bins_before\" and last \"bins_after\" rows of X will remain filled with NaNs, since they don't get filled in below.\n",
	" #This is because, for example, we cannot collect 10 time bins of spikes before time bin 8\n",
	" start_idx=0\n",
	" for i in range(num_examples-bins_before-bins_after): #The first bins_before and last bins_after bins don't get filled in\n",
	" end_idx=start_idx+surrounding_bins; #The bins of neural data we will be including are between start_idx and end_idx (which will have length \"surrounding_bins\")\n",
	" X[i+bins_before,:,:]=neural_data[start_idx:end_idx,:] #Put neural data from surrounding bins in X, starting at row \"bins_before\"\n",
	" start_idx=start_idx+1\n",
	" \n",
	" X_flat=X.reshape(X.shape[0],(X.shape[1]*X.shape[2]))\n",
	" \n",
	" return X_flat"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We will use 10 bins of neural activity from each neuron (the 9 previous bins, and the current bin), to predict the current velocity."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 18,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"X_train_new=get_spikes_with_history(X_train,9,0,1)\n",
	"X_test_new=get_spikes_with_history(X_test,9,0,1)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can't use the first 9 bins (of either the training or testing set), since there aren't enough bins of neural activity preceding them"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"X_train_new2=X_train_new[9:,:]\n",
	"y_train2=y_train[9:,:]\n",
	"\n",
	"X_test_new2=X_test_new[9:,:]\n",
	"y_test2=y_test[9:,:]"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Run the decoder with the matrices above. How much do predictions improve?"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 5. Do encoding"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"collapsed": true
	},
	"source": [
	"In this section we will take the opposite approach, we will fit each neuron's activity on the velocities."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"collapsed": true
	},
	"source": [
	"Let's first define the independent and the dependent variable:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"X=np.copy(vels_binned)\n",
	"y=np.copy(neural_data)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"collapsed": true
	},
	"source": [
	"Remember that X now is a matrix of size \"number of time bins\" x 2 (x and y position) and y is a matrix \n",
	"of size \"number of time bins\" x \"number of neurons\".\n",
	"\n",
	"Regress the firing rate of the first neuron using the preceding 500 ms (10 time bins) of velocities in x and y.\n",
	"Consider each step as an observation when fitting the model. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"from sklearn.linear_model import LinearRegression\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"What is the R^2 you have obatined for this particular neuron?"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Proceed in the same way for the rest of neurons"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"An important question that we should ask ourselves when fitting a model is: was the fit significantly better than\n",
	"chance?\n",
	"\n",
	"There are several ways to answer this question. Here we will use the permutation test on the R^2."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Again, fit the firing rate of neuron 1 using the preceeding 500ms (10 time bins) but shuffle the neuronal \n",
	"activity across observations before fitting the model. Is the R^2 better or worse that without shuffling?"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"from np.random import permutation"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Repeat the process 1000 times and plot the distribution of shuffled R^2 (null-hypothesis) and the original R^2."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Is this neuron significantly modulated by the velocity?"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"What is the fraction of neurons significantly modulated by the velocity?"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 6. Additional Notes"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Here are a couple packages that include lots of nonlinear models for decoding and encoding. <br>\n",
	"Decoding: http://github.com/kordinglab/Neural_decoding <br>\n",
	"Encoding: https://github.com/KordingLab/spykesML <br>\n",
	"\n",
	"*Note that there may be other packages out there, but I'm aware of these since they're from my (Josh's) previous lab."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Questions?\n",
	"\n",
	"If you have questions on this notebook, please contact either:<br>\n",
	"Josh: jig2124@columbia.edu <br>\n",
	"Ramon: rn2446@columbia.edu"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"anaconda-cloud": {},
	"kernelspec": {
	"display_name": "Python [py35]",
	"language": "python",
	"name": "Python [py35]"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.5.2"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 1
	}