Skip to content

Instantly share code, notes, and snippets.

@ahmedtarek-
Created August 28, 2023 19:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ahmedtarek-/94b915158a550ca1a903ca891972b8f1 to your computer and use it in GitHub Desktop.
Save ahmedtarek-/94b915158a550ca1a903ca891972b8f1 to your computer and use it in GitHub Desktop.
AAND Notes

Notes for studying AAND

Overview

  • BCI first time
  • Brecht first time
  • Kempter Firing Rate shit first time
  • BCI revision
  • Kempter revision
  • Brecht first time

Study Organisational stuff


Brecht: A revision

  • Lect. 1

    • Anesthesia
      • Not safe: Lethal dose is very close to the desired dose with 2x
      • Heart doesn't stop because of brain damage since the heart is myotonic (doesn't need brain input). Breathing does, however.
    • Measuring Signals:
      1. Neuronal electric signals: Intra or extra
      2. Biomedical signals: Messengers, Calcium, etc.
      3. Indirect signals: blood flow like in fMRI
  • Lect. 2: Extracellular

    • Staining
      1. Golgi staining to find structure of brain just like Ramon y cajal did in his famous exp.
      2. Myelin stain (obv. stains the myelin sheath) and the axons can be seen.
    • Billions 10^9 of synapses, 100_000 neurons, 400m of dendrites, 4Km of axons in one cubic millimeter
    • Extracellular is easier to do than in other cases
    • Recording must take into consideration the following:
      • Electrode size (too big for single neuron)
      • Small signal (amplification)
      • Impedence matching
  • Lect. 3 Action Potential

  • Lect. 4 Patch Clamp and sharp Microelectrode recording

  • Lect. 4 Patch Clamp and sharp Microelectrode recording

Patch Clamp in vivo:

  • We put electrode and get in search mode until we find a "pulsating" neuron

    • not a rectangle current anymore
    • Press the suction button to activate
    • Giga Ohm ceil mode: (very high resistance)
  • Neurons are quiet

    • They only fire if they need to to conserve energy (Already 10% of energy is spent by )
  • Channel Rhodopsins

    • After Optogenetical changes, light excitatory channels (brain stimulation)
    • Retina channels metapotrobic
    • We like ionotropic channels
    • Like the lab expirement where we tickled the rat
  • Voltage sensitive dyes

    • Help us to notice the voltage change inside th neuron
    • In vitro we would do this by coating the membrane
  • Calcium indicators

    • Using flourecent proteins that bind to calcium
    • Useful because calcium concentration inside the cell in not a lot and also because in the synapse, calcium plays a role in the release of the vesicles so we can also check what's happening in there.

    Problem: Can't specifically target the

  • Fluorescen

    • Reaction of protein that absorb a high wavelength and produces shorter wavelengths because an electron jumps down
    • Allows precise targeting (High signal to noise ration)
  • Green Flourscent Protein

    • Advantages: Many different colors, Easy to hit by photons, high quantum yield
    • Genetically encoded in the cell
    • Coming from a jelly fish
    • Bleaching is a problem with fluoscent (lose the ability to flourscent)
      • GFP are, however, are less prone to bleaching because they're packed they're in a prison cell
  • Microscopy

    • Wihte-field microsopy
      • 100 photons deep
    • Con-focal
      • Use moving laser and detector pin-hole
      • 50 photons deep
    • Two-photon microscopy
      • No detector pinhole needed because two lasers are used (each with not sufficient energy)
      • Both lasers together excite exactly the point needed to be excited
      • Advantage: No excitations on the way
    • EM
  • To find structure of the neurons we can use the following genetic models

    • Raibies
    • Staining
  • Advantages of in-vitro slicing

    • Mechanical stability
    • Controlled environment
    • Apply media that influences ion-concentration
  • Why membrane is good capacitor

    • Large surface
    • Not passing current
    • High Voltage difference across plates

Stains:

Extracellular Recording:

  • Unit recording

    • Assume each spike from all neurons have same amplitude
    • In refactory period, however, there are different voltage recordings
  • Tetrode Recording:

    • 4 electrodes to measure amplitudes
  • High density probes:

    • Like Neuropixels
  • Extra vs Intra cellulary

    • Extra is smaller, noiser and inverted

EM

  • Block serial EM -> We can use thicker slices
    • The only technique to have a look at synapses

Some Anatomy

  • Cortex (Frontal, Parietal, Temporal, Occipital)
    • Gyruc and Sulcus
  • Subcortical
    • Hippocampus -> Roughly memory
    • Basal Ganglia -> Roughly will to action
    • Thalmus and Hypothalmus -> Gating to cortex
    • Colliculi (like super colliculus) -> Orienting
    • Cerebellum -> Motor

FMRI

  • The protons of the hydrogen atoms inside a water molecule have a certain spin

  • Normally this spin is directed in different directions and the M (net magnetic field) is zero.

  • We apply a big magnet generating magnetic field B0 so that all protons align their spin parallel to the B0

  • We then apply RF signals with larmor frequency (to cause resonance with the hydrogen atoms) and this will put them in some spin in alignment with B1.

  • We then release RF signals. The way the protons go back to the aligned B0 basically emits an energy that we measure and that's how we visualize the brain. There's two things to measure

    1. T1: Slow recovery of Longitudinal magnetiziation (relaxation)
    2. T2: Transversal relaxation, dephasing spins
      • Fast dephasing of spins due to spin-spin interactions
      • After 90o rf pulse, the protons start to spin out of phase
    3. T2*: fast dephasing of spins due to combination of spin-spin interactions and magnetic field inhomogeneities
  • An oxygenated hemoglobin (less deoxy-hemoglobin) means slower relaxation rate, more MR signal, brighter image and more T2 signal.

    • This is because the less oxygentaed blood the faster the dephasing of transverse magentization.
    • For equal amount of blood -> less deoxy means more oxy.
  • BOLD blood oxygen level dependant

    • Gets image of the brain throught the susceptibility changes between oxygenated and de-oxygentated blood (darker and brighter spots)
    • Measures hydrogen proton spin changes in hemoglobin depending on how oxygenated it is. (more oxygen -> slower dephasing and brighter)
  • Neurovascular coupling is the reason why more activity in the brain doesn't necessairly mean darker images (a basic hypothesis if we know that more activity means less oxygen)

    • ACTUALLY what happens is the opposite. So with more activity and increases metabolism
    1. Neurovascular coupling causes cerebral blood flow to increase.
    2. This leads to displacement of deoxygentaed hemoglobin
    3. Les deoxy Hb is present
    4. More brighter spots in areas of high metabolism.
  • Hemodynamic Response Function (HRF)

    • Baseline -> mix of oxy and deoxy
    • initial dip -> more deoxy (darker)
    • overshoot -> More oxy than deoxy (brighter)
    • undershoot -> A bit deoxy at the end
    • typical length: 20-30s (0-2 dip, 4-6 overshoot/peak, 10-20s overshoot)
  • ONCE AGAIN, 1mm3 of cortex contains

    • 10p4 - 10p5 neurons
    • 10p8 - 10p9 synapses
    • 300m of dendrites
    • 4000m of axons
    • 0.4m capillaries
  • Spatial resolution: 3mmx3mmx3mm

  • Spatial percision: sub-millimeter

  • Temporal resolution: images typically ~2s apart

  • Temporal percision: in same region & subject: relative precise onset

  • Variability of HRF

    • HRF is variable between different subjects
  • BOLD responses are linear

    • Stimulus -> Neural pathway -> Neural response -> hemoglobin dynamics -> MRI scanner + noise -> MRI image
  • Local Field Potentials (LFP) 40-130 Hz

    • Reflects summation of post-synaptic potentials
  • Multi-Unit Activity (MUA) 300-1500 Hz

    • reflects action potentials/spiking
  • BOLD is slightly better predicted by LFP (input) of a region

  • Preprocessing

    1. Motion correction/realignment
      • Done automatically in an online way in modern scanners
      • Most voxel timeseries variance is due to subject motion (reduced sensitivity)
      • Reduced specificity
      • Rigid body transformation -> Topology is preserved
      • Is done by minimizing the squared difference between the volume in question and the reference volume
      • Volume means a snapshot of the brain (image)
    2. Reslicing
      • Interpolation to adjust the volume to the registered one
      • Different sorts of interpolation -> nearest neighbour, linear, b-spline
    3. Residual error
      • Body moves throughout the acquisition of one volume
      • Ghosts and other artifacts
      • Solution: Include motion params to eliminate later in statistical model
    4. Slice-time correction
      • Delay between each slice being recorded is 60-100ms and it's being done SEQUENTIALLY
      • For a volume of 30 slices, that is 3s
      • difference between first and last slice is 2.9s
      • Slice time correction basically shift slices in time so that they all seem to be captured in the same time of some reference slice (important for statistical analysis)
    5. Normalization
      • To perform data-based meta analysis we need to bring all volumes to some common brain space
      • This is usually a standardized brain like MNI (I actually worked with it before using FSL and Freesurfer)
      • This is done to account for differences in macroscopic brain anatomy and be able to do statistics across different subjects.
      • 12 params: 3 translation, 3 rotation, 3 sheer, 3 zooms
      • 2 steps:
        1. Linearly register EPI images to structured T1 volume
        2. non-linearly adjust the T1 volume (and the EPI images as well) to the segmented T1 images
      • Problems:
        • Local minima in optimization -> reset to reference image origin
        • Too little information -> acquire more slices
        • Lesions -> Masking
    6. Smoothing
      • High freq. noise is independant for every voxel
      • Smoothing enhances SNR
      • Spatial low pass filtering of each image
        • functional homology across subjects increases
        • assumptions of random field theory are better met
      • Done through convolution with discrete standardized gamma kernel
      • Matched-filter-Theorem:
        • If the filter is of the same form and size as the signal, noise is filtered maximally
      • Disadvantages:
        • brains get enlarged, resoultion is lost a bit
        • fine grained information is lost and this could be bad for multi-variate classification
  • Statistics of fMRI

    • Mass-univariate analysis: Test for activity in one brain location at a time and then repeat

      • Basically finding a region that responds stronger during stimulation than test
    • T-test:

      • Given two samples of two different regions
      • Get standard deviation for both
      • Use the equation t = mean_1 - mean_2 / (Std_12 * sqrt(2/n))
    • Accounting for hemodynamic lag

      • through shifting windows?
      • Convolution with Kernels?
      • Problem Convolution with HRF introduces time delay and smoothing
    • Fitting a linear model with y = beta * x + eta

      • where y is data, eta is noise and x is the reference/expected function
      • beta is a linear weighting parameter that is chosen to minimize the sum of squared difference
    • General linear model

      • Consists of multiple Xs (expected functions)
      • Can keep adding colums to the design matrix (to shape what kind of signal we want)
      • For every column (X componenet) we have, we add a beta to the beta vectors
      • Ultimately the goal is to choose betas so that we minimize the mean square of the noise (eta)
    • Useful regressors (reference functions/templates):

      • Mean: Percent signal change is the fluctuation around the mean
      • Discrete cosine set: Model slow drifts in the data
      • Some moving parameters
    • How do we know we model signal and not just noise?

      • First level and second level statistics
    • Statistical Parametrics Map (SPM)

      • A map showing color coded t-values where t-test is significant
    • T-statistics

      • One you use for betas (to check if sample you got has mean larger than assumed population)
    • Questions:

      • WTF is first and second level statistics? Why do we need them?
      • Last couple of slides? WTF is happening there?
  • Phase encoding vs Frequency encoding?


BCI

Questions

  • How do we visualize weight vectors (result of spatial classifier)?

  • If we have a setup with 64 electrodes but we want to narrow it down to 10 what do we do, not to lose accuracy in classifying?

    • Spatio-temporal: We can do temporal classification per channel and check the 10 with most descrinability. However, sometimes we care for the combinations of channels and not just the top 10 so we just try
    • In the context of csp, what
  • Pseudo-inverse?

    • In general to get inverse of non-square matrix
    • S+ = S.T (S * S.T)-1
    • Reasoning:
      • We want to get A_hat (for the forward model)
      • We get the filters (W) as part of backward model
      • At this point, we have X (data), W (filters), and the S (sources as output from backward model)
      • We get A_hat Cov_X * W * Invers_Cov_s
      • Now we can use this A_hat to understand stuff about our classifier mel a5r To assess our filters
  • [Shrinkage] ν defined as average eigenvalue trace ðΣˆ Þ = d of Σ ??

    • Trace (Summation of diagonal) is the same as summation of eigenvalues (for the covariance matrix) mel a5r, mean of eigenvalues.
  • [Shrinkage] What is

BCI: Spatial patterns and filters

  • The proppogation pattern (a) depends on :

    1. The conductivity of the intermediate tissue
    2. The location and orientation of source
    3. The impedence and location of the electrode
  • DOING PCA in the beginning and removing artifacts (eigenvalues after 100)

    • Projection into eigenvalues space -> Part o whitening
    • Projected data has uncorrelated components
    • Whitening is helpful because we assume variance is not important and we want to remove it
    • Because we believe variance is not important in this situation
    • We also believe that mean is more important
  • Patterns (A) are the vectors applied in source space that describe how signals are propagated to the sensors (Forward model)

  • Filters (W) are the vectors applied in sensor space that gives us the source (Backward model)

  • Recovering a source S1 from a combination (x = a1s1 + a2s2) in the non trivial case where a1 and a2 are not orthogonal would need to take propogation vector (a1 and a2) into consideration.

    1. This is because when we apply a filter w, it needs to be orthogonal to the direction of the propogration vector of the other source.
    2. (1) will gurantee that we eliminate the effect of the other source from our calculation.
  • THEREFORE, to be able to calculate the filters. We need the scalp distribuition (locations) of all the sources (not just the ones we need to reconstruct)

  • ALSO, an optimal filter which aims at good signal-noise-ration (SNR) must be aprox. orthogonal to noise sources as well

  • Oddball paradigm: present one odd stimulus between several equal stimuli P300: is a positive deflection after the odd stimulus is presented

BCI: LDA and NCC

  • Linear Discriminant Analysis (LDA) is a powerful tool for classification and feature extraction

  • To be able to classify using LDA

    1. Classes samples are gaussian distribuited
    2. Gaussian distribuition for the two classes should have the same covariance matrix
    3. True distribuitions are known (always violated in real examples)
  • For the second condition mentioned above, one can verify this by doing an eigenvalue decomposition for the covariance matrices of both classes. Extracting the eigenvectors (PCs) corresponding to the largest eigenvalues and visualize them as scalp topography and observing the similarities

  • A linear classifier is basically a linear separator (a line if the data was 2D).

    • Consists of w and b (bias)
    • w = avg_cov * (mean_2 - mean_1)
    • A linear classifier trained on spatial features (i.e freezing time and taking all channels data) would give us a spatial filter and hence a "BACKWARD MODEL"
    • AGAIN, an optimal filter require intricate structure and might assign a siginficant weight to channels/electrodes that might not be clear from looking at scalp map corresponding to the weight obtained if we assume that the covariance matrix is spehrical.
    • Last point recap; we might add another channel that in itself is not descriminative but increases the accuracy of classification.
  • Usually, so many problems rise because we don't know the exact covariance matrix for the data for the two classes. we assume it to be the one we get empirically (from the data).

    • However, for a high dimensional data with not so many datapoins/samples, our estimation is compromised.
    • That's why we do regularization through shrinkage
  • Bias in estimating the covariance matrix

    • Problem when number of samples (n) is small compared to dimensions size (d)
    • leads to systematic bias; large eigenvalues are too large and small eigenvalues are too small
  • Shrinkage come to the rescue

    • Doing the math we find out that the eigenvalue decomposition of the "shrunk" covariance matrix yields a formula that also has the same eigenvectors for both covariance matrices.
    • The only change is the scaling of the diagonal matrix towards the average v
    • Regularization of LDA
    • R_LDA = (1-gamma)xCOV + gamma * v * I
      • gamma = 0 -> No shrinkage
      • gamma = 1 -> Spherical covariance (NCC)
      • γ = 0.05 (for example was a good choice for some dataset)
    • There's an analytical way to calculate the optimal gamma param
      • This is ofc way more effecient than methods such as cross validation
  • How to know shrinkage occurs along the right axes?

    • Because the eigenvectors should remain the same as the normal EVD of covariance matrix then this would be one way to check.
  • Quick note on how this was verified:

    1. Generated 2D gaussian distributed data for two classes representing two channels
    2. A disturbing signal was generated simulating visual alpha at a third channel (Oz)
    3. (2) Makes is way more difficult to have a linear separation between the two classes
    4. However, including data from the third channel (Oz) makes it possible again.
  • Classification: Temporal vs Spatial

    • Temporal: classify temporal data per channel and get accuracy (one value per channel) then it can be visualized to give idea of the the spatial distribution of discriminative information.
    • Spatial: Classifying of features measured at one time point (or averaged in a certain time range) provide good results.
    • The shrinkage parameter reflect our trust in reliability of the data:
      • If we can estimate the covariance well -> We should choose small gamma
      • If spatial structure of noise cannot be reliably estimated -> We shrink and use higher gamma to avoid overfitting over the noise
  • Where is the P300 measured?

    • parieto-central apparently

BCI: Classification Basics

  • r2 value as a measure of separability (point biserial correlation coefficient)

    • We calculate for each channel/timepoint pair
    • Given two classes of data, we gather samples from both classes.
    • Calculate using number of samples of two classes and the mean of each class (difference of mean squared / variance)
    • Get the sign from multiplying by difference of mean values
      • -1 => Really separated in a direction
      • 0 => Mischung and no clear separation
      • 1 => Really separated in the other direction
    • USAGE: We get a heatmap matrix of channels and time where dark colors indicate that this spot has high separability.
    • ALSO useful because it shows propogation of components (i.e where does a certain peak originate from)
    • We use SHRINKAGE in spation-temporal analysis like this because of the high dimensionality.
    • We get an average per time interval and use this. Therefore we should use intervals where the spatial pattern is more or less constant.
  • Area under the Curve (AUC) as Measure of Seperation

    • We draw a childish curve for the missed detection of one class
    • Will end up drawing of only up or right depending on class
    • A perfect separation would mean going all the way right then all the way up (or vice versa)
      • 0 or 1 => Perfect separation
      • 0.5 => Mischung and bad separation
  • So a univariate feature means => What is the value for different classes at different trials for a specific channel and specific time point (ex. P300)

  • From univariate feature to measure at different time points => temporal feature

  • From univariate feature to measure same time but multiple channels => spatial feature

  • From univariate feature to measure different times at multiple channels => spatio-temporal feature

  • Linear vs ERP (Event Related Potential)

    • Linear is basically x = a * s [forward] OR s = w * x (backward)
    • ERP 𝑝(𝑡) is linear model superimposed by background brain activity (and noise) 𝑟(𝑡) 𝑥(𝑡) = 𝑝(𝑡) + 𝑟(𝑡)

BCI: CSP (Common Spatial Patterns)

  • Idle rythym are descynchonized when something happens

    • mu rythyms in the motor cortex (like c3 or c4)
    • Indegneous rythyms (from within brain without stimulus)
    • Exogenous rythyms (happens for example in auditory cortex when exposed to some rythym from outside)
  • EOG and EMG (from gaze and muscles) are prese

  • In the frequency, we look at the power of certain frequencies (alpha, beta, gamma) (from FFT for example)

  • SMR (sensory motor rythym)

  • Spatial Smearing (bahattan)

    • Volume Conductance: Brain is good conductor, therefore if there's a signal somewhere in the brain, it will propogate kind of equally everywhere.
  • Simple Spatial filtering (what is the reference voltage we are subtracting from) [To counter against volume conductance]

    • CAR (common average): take the average out
    • Laplace: Subtract the average of the 4 neighbours
    • Bipolar: Subtract just one other channel
  • Data driven Spatial Filtering

    • PCA, ICA and CSP are examples
  • CSP Revision

    • A method that finds spatial filters that makes the two classes maximally differntiable in terms of variance

    • In other words: The spatially filtered signal should have high variance for trials of one class and low variance for trials of the other class.

    • Everything before we were dealing with amplitudes

    • In CSP, Information sits in the variance of the frequencies that we kept

      • Means wouldn't be discrimnative
    • We want less features space to be able to discriminate between classes.

    • It's a Data driven spatial fitlering method

  • In CSP we get a filter matrix W that is simultaneously the diagonizer of both the covariance matrices of both classes

    • So that W⊤ Σ1 W = D1 and W⊤ Σ2 W = D2
    • We also get this property that D1 + D2 = I
  • CSP steps:

    • Gather data from two conditions (right vs Left) into their own matrices (channels x epochs x time)
    • Calculate covariance for each
    • GEVD and choose Ws (corresponding to eignevalues that fit some criteria)
      • 3 from above and 3 from below (above has high variance in one conditiion and low in other and below is opposite)
    • Multiply W.T by data (project into the eigenvector space/ CSP space)
    • Take the log variance of the data in that space
    • Classify wit LDA
      • shrinkage is not necessary in this case because the data is low dimensional in this case
  • Preprocesing Steps:

    1. Determine a suitable frequency band that shows good discrimination between the conditions. Done by looking at all channels
      • ERD/ERS: Event-Related (De)Synchronization
      • We get ERD through
        1. Band pass the desired frequency
        2. Envelope / Hilbert transformation
        3. Average that across trials
        4. You got ur ERD
    2. Using ERD we determine suitable time interval that shows good discrimination between conditions
    3. We have different options we can use here
      • Classify on PDS
      • Classify directly on ERD plots (spatial-temporal features from ERD time series per channel and use shrinkage LDA)
      • Band pass filter and apply log variance and cassify with shrinkage LDA
  • Log Band power features

    • Plot potentials (if clearly separable they will be like cross)
    • Take the power
    • Take the log
    • BECAUSE THE DATA IS CORRELATED AND THE VARIANCES ARE SIMILAR IN BOTH DIRECTIONS Calculating band-power features in raw channels like that would make the mixing of information irreversible for subsequent classification.
    • We need to do CSP (spatial filtering) beforehand
  • CROSS VALIDATION

  • WHY LOG??

    • More gaussian
  • It's eigenvalue decomposition at the end of the day.

  • In order to obtain good band-power features, we need to apply some spatial filtering (eg. CSP) before calculating log band-power.

  • For classification, Common Spatial Patterns (CSP) Analysis is a useful technique. The goal of CSP is to determine spatial filters that optimally contrast modulations of brain rhythms in two conditions. This way we can ultimately get the

  • Questions

    • "Calculate the signed 𝑟2-values and add them as title of the subplots", the result of each trial has a length corresponding to the number of datapoint in that trial. How do I get one value to put as title for the subplot? Mean?
    • Just to make sure, is epoch synonym for trial?
    • In the last part, it says "Determine log band-power values for each trial and channel (without CSP filter), calculate the signed 𝑟2 -values for each channel and display the result as topography." What exactly should be displayed within a topography? The r2-values of the log band-power values? It's a bit unclear and I am not sure hwo to understand it.

Kempter

  • QUESTIONS

    • [Lect 2] Measures to quantify spike trains??
  • Basics

    • Spikes have ~100mv amplitude and 1ms width
    • Extracellular is 1000 times smaller
    • Neural code?
      • Relation between stimulus and neural response
      • What part of spike train carries information about the stimulus
    • Encoding is from stimulus to response
    • Decoding is getting the stimulus from neural response
    • Simpliest neural encoding (from stimulus s to neural response p)
      • Tuning curves
    • The dirac
      • 1/A for -A/2 to A/2
      • inf at tau = 0 and 0 otherwise
  • How do we measure firing rate?

    • Neural response function
    • There are three types of averging to apply to the neural response function
      1. temporal average
      2. trial average
      3. neuron average
    • spike count rate:
      • is the simple number of spikes over time average
    • Temporal averaging
      • Bining (histogram)
        • Rate has discrete values and depends on placement of bins
      • Filtering / Kernel
        • Basically going through every spike and putting the kernel on it and then adding kernel value if it's centered there.
        1. Rectangular filter or running average which is not super good because it's discrete)
        2. Gaussian filter: is good because it's continous but still not causal
        3. Alpha filter: Causal and continuous because it's not symmetrics because
        • Minimum width is T/n
    • Trial average
      • The one with the brackets
      • trial average spiking rate on it's own is not useful
      • What is useful is trial average + temporal average
        • Basically we do trial avearage for neuronal spike/dirac function first and then we do a temporal average using rect. window
      • More useful because it allows much smaller width of kernel
      • Special case when we integrate over the whole experiment:
        • Will be the average number of spikes (total spikes in every trial over number of trials) divided by the whole time -> Average Firing rate
      • Drawing a graph of time against rate can give us a good insight
        • Area under Curve (AUC) would be rate * time = quantity or probability given that the area is small (<< 1)
    • Neuron average

  • Tuning Curves

    • Simple mapping between stimulus (like orientation angle of a bar in the visual field of monkey) and the firing rate (as frequency)
    • Gaussian tuning curve would look like gaussina with f(s) = max_firing_rate * exp(-1/2 * ((s - s_max)/ variance)**2
  • Spike Triggered Average (STA)

    • Is the average value of stimulus a time interval tau before a spike is fired
    • BASICALLY, for a spike happening at time ti we get the value for the stimulus at ti - tau and then we average for all spikes
    • We ALSO do an average over all trials
    • OFCOURSE this is not just done for one tau but multiple continous tau so that we have an idea of how the whole range 0 < t < tau looks like for the stimulus
    • Assumption needed is that the stimulus has zero mean (i.e integration over time range is zero)
    • C(tau) is zero if:
      • tau is large because of finite memory
      • tau is smaller than zero (causality and basic time concept haha)
      • any tau if stimulus is not related to spiking
    • Can be regarded as correlation between neural response function (or firing rate) with stimulus
      • We remove the summation and add an integral over the stimulus multiplied with the neural function (which is dirac delta)
      • We also use the fact that trial average of p is trial average of r
    • C(tau) = Qrs(-tau) * T / <n>
    • STA is REVERSE correlation
  • Correlation vs Convolution

    • In convolution we have -t +- tau if t is the integration variable
    • In correlation we have +t +- tau if t is the integration variable
    • Correlation is a good way to see how two functions are similar to each other at a particular offset tau
    • So cross-correlation Qxy = 1/T * Integral( x(t) * y(t + tau) )dt
    • Autocorrelation Qxx = 1/T * Integral( x(t) * x(t + tau))dt
  • Interpetations of firing rates:

    • Gives us information about spike train, area under the curve gives us probability of having a spike in some time (only if interval is small interval << 1).
  • What is a good stimulus

    • Should produce spikes (right modality)
    • Should be close to natural stimulus
    • Should cover the space of possible range of stimulii
    • Good compromise is white noise?
  • White noise stimulus?

    • Uncorrelated stimulus Qss = sigma**2 * dirac where sigma is the noise variability
    • The correlation function is proportional to the dirac delta
    • Numerical analysis:
      • noise variability / delta_t (sampling rate)
      • basically the VARIANCE of the white noise is inversly proportional to the sampling rate
  • Spike-Train Statistics

    • Probability of a specific spike train is practically impossible to know becasue there's so many possibilities
    • VERY STRONG assumption: Spikes are statistically independant
  • Poisson process

    • The homogenous poisson process provides us with estimate for p(t1, t2 . . tn)
    • Basic assumption
      • Probability to have one spike in [t, t+dt] is rate * dt
      • Prob of no spike is 1 - r*dt
    • Probability of having n arbitrary spikes in T is ( (r*T)**n / n! ) * exp(-r*T) -> here the (rT) is the center/mean of the gaussian
    • An important graph is p(n) against (rT)
      • This gives us the probability that a hom. poisson process gives us a certain number of spikes
      • For example Probability of hom. poisson process giving us:
        • 0 spikes for (rT = 0) is 1
        • 1 spike for (rT = 1) is 0.4
        • 2 spikes for (rT = 2) is 0.3
      • This graph has decreasing peaks as n gets higher but always peaks on that value
    • SOOOO the probability of having a specific spike train of length n is p(t1, t2 . . tn) = P(n) * n! * (dt / T)**n , where - P(n) : is that poisson distribuition for an arbitrary n spikes to exist/happen - n! : is the number of permutations - dt/T : is volume of normalized n-dim
    • From the dist. of spike count we can see that the mean is rt
    • The variance would be <n**2> - <n>**2 which is also equal rt
    • Thus the variance and mean of the spike count are equal. Fano factor: The ratio of variance to mean and it takes value ONE for a homogeneous Poisson process, independent of the time interval T.
  • Homo vs inhomog. Poisson process

    • Homogenous has firing rate being consistent
    • Inhomg. has inconsistent firing rate
  • Interspike-Interval Distribution

    • We count on Poisson to know the probability of n spikes to happen
    • After calculating we find P_isi = r * exp ( -r * t )
    • mean: 1 / r
    • variance: 1 / r2
    • Coeffecient of variation (std_deviation / mean): 1
    • Poisson doesn't capture the refractory period so WE ADD IT ARTIFICALLY IN STOCHASTIC WAY
    • Coeffecient of variation (CV) measures how regular the interspike interval is
      • High CV -> Quite irregular
      • Zero CV -> Perfectly regular
  • Spike-Train Autocorrelation

    • Important to CHARACTERIZE spike trains and to CLASSIFY neuron types
    • Generalizes the ISI distribution P_ISI, which relates pairs of SUCCESSIVE SPIKES
    • Quantifies the relation between spikes at an arbitrary offset tau
    • Autocorrelation of spike train Q_pp(tau) = 1/T * integral( (p(t) * p(t + tau)) - r**2 )dt
    • There's ofc a numerical calculation method for this which is intuitive
  • How do we estimate firing rate?

    • Using the neural encoding ansatz
      • The ansatz linear
      • No adaptation, not saturation and possible negative firing rate
  • Saturation: Can't go past a firing rate

  • Adaptation: Tired of firing because neuron get used to it.

Entropy

  • Correlation in true neurons (vs the ones in pure Poisson distribuion) causes true neurons to have less entropy (because we have more information)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment