ahmedtarek-/aand_notes.md

## aand_notes.md

      
    Raw
  

              aand_notes.md
            
          
    Notes for studying AAND

Overview


 BCI first time
 Brecht first time
 Kempter Firing Rate shit first time
 BCI revision
 Kempter revision
 Brecht first time

Study Organisational stuff


Tien from TU cloud
https://tubcloud.tu-berlin.de/s/wExS9fdXSaRnpdz?dir=undefined&path=%2FAAND%2FTien&openfile=2996721541
Martin
https://tubcloud.tu-berlin.de/s/wExS9fdXSaRnpdz?dir=undefined&path=%2FAAND%2FMartin%20Iniguez&openfile=2801580656


Brecht: A revision


Lect. 1

Anesthesia

Not safe: Lethal dose is very close to the desired dose
with 2x
Heart doesn't stop because of brain damage since the heart is myotonic
(doesn't need brain input). Breathing does, however.


Measuring Signals:

Neuronal electric signals: Intra or extra
Biomedical signals: Messengers, Calcium, etc.
Indirect signals: blood flow like in fMRI


Lect. 2: Extracellular

Staining

Golgi staining to find structure of brain just like Ramon y cajal did in his famous exp.
Myelin stain (obv. stains the myelin sheath) and the axons can be seen.


Billions 10^9 of synapses, 100_000 neurons, 400m of dendrites, 4Km of axons in one cubic millimeter
Extracellular is easier to do than in other cases
Recording must take into consideration the following:

Electrode size (too big for single neuron)
Small signal (amplification)
Impedence matching


Lect. 3 Action Potential


Lect. 4 Patch Clamp and sharp Microelectrode recording


Lect. 4 Patch Clamp and sharp Microelectrode recording


Patch Clamp in vivo:


We put electrode and get in search mode until we find a "pulsating" neuron

not a rectangle current anymore
Press the suction button to activate
Giga Ohm ceil mode: (very high resistance)


Neurons are quiet

They only fire if they need to to conserve energy (Already 10% of energy is spent by )


Channel Rhodopsins

After Optogenetical changes, light excitatory channels (brain stimulation)
Retina channels metapotrobic
We like ionotropic channels
Like the lab expirement where we tickled the rat


Voltage sensitive dyes

Help us to notice the voltage change inside th neuron
In vitro we would do this by coating the membrane


Calcium indicators

Using flourecent proteins that bind to calcium
Useful because calcium concentration inside the cell in not a lot
and also because in the synapse, calcium plays a role in the release of the vesicles
so we can also check what's happening in there.


Problem: Can't specifically target the


Fluorescen

Reaction of protein that absorb a high wavelength and produces shorter wavelengths
because an electron jumps down
Allows precise targeting (High signal to noise ration)


Green Flourscent Protein

Advantages: Many different colors, Easy to hit by photons, high quantum yield
Genetically encoded in the cell
Coming from a jelly fish
Bleaching is a problem with fluoscent (lose the ability to flourscent)

GFP are, however, are less prone to bleaching because they're packed they're
in a prison cell


Microscopy

Wihte-field microsopy

100 photons deep


Con-focal

Use moving laser and detector pin-hole
50 photons deep


Two-photon microscopy

No detector pinhole needed because two lasers are used (each with not sufficient energy)
Both lasers together excite exactly the point needed to be excited
Advantage: No excitations on the way


EM


To find structure of the neurons we can use the following genetic models

Raibies
Staining


Advantages of in-vitro slicing

Mechanical stability
Controlled environment
Apply media that influences ion-concentration


Why membrane is good capacitor

Large surface
Not passing current
High Voltage difference across plates


Stains:


Extracellular Recording:


Unit recording

Assume each spike from all neurons have same amplitude
In refactory period, however, there are different voltage recordings


Tetrode Recording:

4 electrodes to measure amplitudes


High density probes:

Like Neuropixels


Extra vs Intra cellulary

Extra is smaller, noiser and inverted


EM


Block serial EM -> We can use thicker slices

The only technique to have a look at synapses


Some Anatomy


Cortex (Frontal, Parietal, Temporal, Occipital)

Gyruc and Sulcus


Subcortical

Hippocampus                        -> Roughly memory
Basal Ganglia                      -> Roughly will to action
Thalmus and Hypothalmus            -> Gating to cortex
Colliculi (like super colliculus)  -> Orienting
Cerebellum                         -> Motor


FMRI


The protons of the hydrogen atoms inside a water molecule have a certain spin


Normally this spin is directed in different directions and the M (net magnetic field)
is zero.


We apply a big magnet generating magnetic field B0 so that all protons align their spin
parallel to the B0


We then apply RF signals with larmor frequency (to cause resonance with the hydrogen atoms)
and this will put them in some spin in alignment with B1.


We then release RF signals. The way the protons go back to the aligned B0 basically emits
an energy that we measure and that's how we visualize the brain. There's two things to measure

T1: Slow recovery of Longitudinal magnetiziation (relaxation)
T2: Transversal relaxation, dephasing spins

Fast dephasing of spins due to spin-spin interactions
After 90o rf pulse, the protons start to spin out of phase


T2*: fast dephasing of spins due to combination of spin-spin interactions
and magnetic field inhomogeneities


An oxygenated hemoglobin (less deoxy-hemoglobin) means slower relaxation rate,
more MR signal, brighter image and more T2 signal.

This is because the less oxygentaed blood the faster the dephasing of transverse
magentization.
For equal amount of blood -> less deoxy means more oxy.


BOLD blood oxygen level dependant

Gets image of the brain throught the susceptibility changes between oxygenated and
de-oxygentated blood (darker and brighter spots)
Measures hydrogen proton spin changes in hemoglobin depending on how oxygenated it is.
(more oxygen -> slower dephasing and brighter)


Neurovascular coupling is the reason why more activity in the brain doesn't necessairly mean
darker images (a basic hypothesis if we know that more activity means less oxygen)

ACTUALLY what happens is the opposite. So with more activity and increases metabolism


Neurovascular coupling causes cerebral blood flow to increase.
This leads to displacement of deoxygentaed hemoglobin
Les deoxy Hb is present
More brighter spots in areas of high metabolism.


Hemodynamic Response Function (HRF)

Baseline     -> mix of oxy and deoxy
initial dip  -> more deoxy (darker)
overshoot    -> More oxy than deoxy (brighter)
undershoot   -> A bit deoxy at the end
typical length: 20-30s (0-2 dip, 4-6 overshoot/peak, 10-20s overshoot)


ONCE AGAIN, 1mm3 of cortex contains

10p4 - 10p5 neurons
10p8 - 10p9 synapses
300m of dendrites
4000m of axons
0.4m capillaries


Spatial resolution: 3mmx3mmx3mm


Spatial percision: sub-millimeter


Temporal resolution: images typically ~2s apart


Temporal percision: in same region & subject: relative precise onset


Variability of HRF

HRF is variable between different subjects


BOLD responses are linear

Stimulus -> Neural pathway -> Neural response -> hemoglobin dynamics -> MRI scanner + noise -> MRI image


Local Field Potentials (LFP) 40-130 Hz

Reflects summation of post-synaptic potentials


Multi-Unit Activity (MUA) 300-1500 Hz

reflects action potentials/spiking


BOLD is slightly better predicted by LFP (input) of a region


Preprocessing

Motion correction/realignment

Done automatically in an online way in modern scanners
Most voxel timeseries variance is due to subject motion (reduced sensitivity)
Reduced specificity
Rigid body transformation -> Topology is preserved
Is done by minimizing the squared difference between the volume in question
and the reference volume
Volume means a snapshot of the brain (image)


Reslicing

Interpolation to adjust the volume to the registered one
Different sorts of interpolation -> nearest neighbour, linear, b-spline


Residual error

Body moves throughout the acquisition of one volume
Ghosts and other artifacts
Solution: Include motion params to eliminate later in statistical model


Slice-time correction

Delay between each slice being recorded is 60-100ms and it's being done SEQUENTIALLY
For a volume of 30 slices, that is 3s
difference between first and last slice is 2.9s
Slice time correction basically shift slices in time so that they all seem to be
captured in the same time of some reference slice (important for statistical analysis)


Normalization

To perform data-based meta analysis we need to bring all volumes to some common brain space
This is usually a standardized brain like MNI (I actually worked with it before using FSL and Freesurfer)
This is done to account for differences in
macroscopic brain anatomy and be able to do statistics across different subjects.
12 params: 3 translation, 3 rotation, 3 sheer, 3 zooms
2 steps:

Linearly register EPI images to structured T1 volume
non-linearly adjust the T1 volume (and the EPI images as well) to the segmented
T1 images


Problems:

Local minima in optimization -> reset to reference image origin
Too little information -> acquire more slices
Lesions -> Masking


Smoothing

High freq. noise is independant for every voxel
Smoothing enhances SNR
Spatial low pass filtering of each image

functional homology across subjects increases
assumptions of random field theory are better met


Done through convolution with discrete standardized gamma kernel
Matched-filter-Theorem:

If the filter is of the same form and size as the signal, noise is filtered maximally


Disadvantages:

brains get enlarged, resoultion is lost a bit
fine grained information is lost and this could be bad for multi-variate classification


Statistics of fMRI


Mass-univariate analysis: Test for activity in one brain location at a time and then repeat

Basically finding a region that responds stronger during stimulation than test


T-test:

Given two samples of two different regions
Get standard deviation for both
Use the equation t = mean_1 - mean_2 / (Std_12 * sqrt(2/n))


Accounting for hemodynamic lag

through shifting windows?
Convolution with Kernels?
Problem Convolution with HRF introduces time delay and smoothing


Fitting a linear model with y = beta * x + eta

where y is data, eta is noise and x is the reference/expected function
beta is a linear weighting parameter that is chosen to minimize the sum of
squared difference


General linear model

Consists of multiple Xs (expected functions)
Can keep adding colums to the design matrix (to shape what kind of signal we want)
For every column (X componenet) we have, we add a beta to the beta vectors
Ultimately the goal is to choose betas so that we minimize the mean square of
the noise (eta)


Useful regressors (reference functions/templates):

Mean: Percent signal change is the fluctuation around the mean
Discrete cosine set: Model slow drifts in the data
Some moving parameters


How do we know we model signal and not just noise?

First level and second level statistics


Statistical Parametrics Map (SPM)

A map showing color coded t-values where t-test is significant


T-statistics


One you use for betas (to check if sample you got has mean larger than assumed population)


Questions:

WTF is first and second level statistics? Why do we need them?
Last couple of slides? WTF is happening there?


Phase encoding vs Frequency encoding?


BCI

Questions


How do we visualize weight vectors (result of spatial classifier)?


If we have a setup with 64 electrodes but we want to narrow it down to 10 what do we do,
not to lose accuracy in classifying?

Spatio-temporal: We can do temporal classification per channel and check the 10
with most descrinability. However, sometimes we care for the combinations of channels and not just
the top 10 so we just try
In the context of csp, what


Pseudo-inverse?

In general to get inverse of non-square matrix
S+ = S.T (S * S.T)-1
Reasoning:

We want to get A_hat (for the forward model)
We get the filters (W) as part of backward model
At this point, we have X (data), W (filters), and the S (sources as output from backward model)
We get A_hat Cov_X * W * Invers_Cov_s
Now we can use this A_hat to understand stuff about our classifier
mel a5r To assess our filters


[Shrinkage] ν defined as average eigenvalue trace ðΣˆ Þ = d of Σ ??

Trace (Summation of diagonal) is the same as summation of eigenvalues (for the covariance matrix)
mel a5r, mean of eigenvalues.


[Shrinkage] What is


BCI: Spatial patterns and filters


The proppogation pattern (a) depends on :

The conductivity of the intermediate tissue
The location and orientation of source
The impedence and location of the electrode


DOING PCA in the beginning and removing artifacts (eigenvalues after 100)

Projection into eigenvalues space -> Part o whitening
Projected data has uncorrelated components
Whitening is helpful because we assume variance is not important and we want to remove it
Because we believe variance is not important in this situation
We also believe that mean is more important


Patterns (A) are the vectors applied in source space that describe how signals are propagated
to the sensors (Forward model)


Filters (W) are the vectors applied in sensor space that gives us the source (Backward model)


Recovering a source S1 from a combination (x = a1s1 + a2s2) in the non trivial case where a1 and
a2 are not orthogonal would need to take propogation vector (a1 and a2) into consideration.

This is because when we apply a filter w, it needs to be orthogonal to the direction of
the propogration vector of the other source.
(1) will gurantee that we eliminate the effect of the other source from our calculation.


THEREFORE, to be able to calculate the filters. We need the scalp distribuition (locations) of
all the sources (not just the ones we need to reconstruct)


ALSO, an optimal filter which aims at good signal-noise-ration (SNR) must be aprox. orthogonal to noise
sources as well


Oddball paradigm: present one odd stimulus between several equal stimuli
P300: is a positive deflection after the odd stimulus is presented


BCI: LDA and NCC


Linear Discriminant Analysis (LDA) is a powerful tool for classification and
feature extraction


To be able to classify using LDA

Classes samples are gaussian distribuited
Gaussian distribuition for the two classes should have the same covariance matrix
True distribuitions are known (always violated in real examples)


For the second condition mentioned above, one can verify this by doing an eigenvalue decomposition
for the covariance matrices of both classes. Extracting the eigenvectors (PCs) corresponding to the largest
eigenvalues and visualize them as scalp topography and observing the similarities


A linear classifier is basically a linear separator (a line if the data was 2D).

Consists of w and b (bias)
w = avg_cov * (mean_2 - mean_1)
A linear classifier trained on spatial features (i.e freezing time and taking all channels data)
would give us a spatial filter and hence a "BACKWARD MODEL"
AGAIN, an optimal filter require intricate structure and might assign a siginficant weight to channels/electrodes
that might not be clear from looking at scalp map corresponding to the weight obtained if we assume that the covariance
matrix is spehrical.
Last point recap; we might add another channel that in itself is not descriminative but increases the accuracy
of classification.


Usually, so many problems rise because we don't know the exact covariance matrix for the data for the two classes.
we assume it to be the one we get empirically (from the data).

However, for a high dimensional data with not so many datapoins/samples, our estimation is compromised.
That's why we do regularization through shrinkage


Bias in estimating the covariance matrix

Problem when number of samples (n) is small compared to dimensions size (d)
leads to systematic bias; large eigenvalues are too large and small eigenvalues are too small


Shrinkage come to the rescue

Doing the math we find out that the eigenvalue decomposition of the "shrunk" covariance matrix
yields a formula that also has the same eigenvectors for both covariance matrices.
The only change is the scaling of the diagonal matrix towards the average v
Regularization of LDA
R_LDA = (1-gamma)xCOV + gamma * v * I

gamma = 0 -> No shrinkage
gamma = 1 -> Spherical covariance (NCC)
γ = 0.05 (for example was a good choice for some dataset)


There's an analytical way to calculate the optimal gamma param

This is ofc way more effecient than methods such as cross validation


How to know shrinkage occurs along the right axes?

Because the eigenvectors should remain the same as the normal EVD of covariance matrix
then this would be one way to check.


Quick note on how this was verified:

Generated 2D gaussian distributed data for two classes representing two channels
A disturbing signal was generated simulating visual alpha at a third channel (Oz)
(2) Makes is way more difficult to have a linear separation between the two classes
However, including data from the third channel (Oz) makes it possible again.


Classification: Temporal vs Spatial

Temporal: classify temporal data per channel and get accuracy (one value per channel)
then it can be visualized to give idea of the the spatial distribution of
discriminative information.
Spatial: Classifying of features measured at one time point (or averaged in a certain time range)
provide good results.
The shrinkage parameter reflect our trust in reliability of the data:

If we can estimate the covariance well -> We should choose small gamma
If spatial structure of noise cannot be reliably estimated -> We shrink and use higher gamma
to avoid overfitting over the noise


Where is the P300 measured?

parieto-central apparently


BCI: Classification Basics


r2 value as a measure of separability (point biserial correlation coefficient)

We calculate for each channel/timepoint pair
Given two classes of data, we gather samples from both classes.
Calculate using number of samples of two classes and the mean of each class (difference of mean squared / variance)
Get the sign from multiplying by difference of mean values

-1 => Really separated in a direction
0 => Mischung and no clear separation
1 => Really separated in the other direction


USAGE: We get a heatmap matrix of channels and time where dark colors indicate that this spot
has high separability.
ALSO useful because it shows propogation of components (i.e where does a certain peak originate from)
We use SHRINKAGE in spation-temporal analysis like this because of the high dimensionality.
We get an average per time interval and use this. Therefore we should use intervals where the spatial
pattern is more or less constant.


Area under the Curve (AUC) as Measure of Seperation

We draw a childish curve for the missed detection of one class
Will end up drawing of only up or right depending on class
A perfect separation would mean going all the way right then all the way up (or vice versa)

0 or 1 => Perfect separation
0.5    => Mischung and bad separation


So a univariate feature means =>
What is the value for different classes at different trials for a specific channel and specific time point (ex. P300)


From univariate feature to measure at different time points
=> temporal feature


From univariate feature to measure same time but multiple channels
=> spatial feature


From univariate feature to measure different times at multiple channels
=> spatio-temporal feature


Linear vs ERP (Event Related Potential)

Linear is basically x = a * s [forward] OR s = w * x (backward)
ERP 𝑝(𝑡) is linear model superimposed by background brain activity (and noise) 𝑟(𝑡)
𝑥(𝑡) = 𝑝(𝑡) + 𝑟(𝑡)


BCI: CSP (Common Spatial Patterns)


Idle rythym are descynchonized when something happens

mu rythyms in the motor cortex (like c3 or c4)
Indegneous rythyms (from within brain without stimulus)
Exogenous rythyms (happens for example in auditory cortex when exposed to some rythym from outside)


EOG and EMG (from gaze and muscles) are prese


In the frequency, we look at the power of certain frequencies (alpha, beta, gamma)
(from FFT for example)


SMR (sensory motor rythym)


Spatial Smearing (bahattan)

Volume Conductance: Brain is good conductor, therefore if there's a signal somewhere in the brain, it will propogate kind of equally everywhere.


Simple Spatial filtering (what is the reference voltage we are subtracting from) [To counter against volume conductance]

CAR (common average): take the average out
Laplace: Subtract the average of the 4 neighbours
Bipolar: Subtract just one other channel


Data driven Spatial Filtering

PCA, ICA and CSP are examples


CSP Revision


A method that finds spatial filters that makes the two classes maximally
differntiable in terms of variance


In other words: The spatially filtered signal should have high variance for trials of one
class and low variance for trials of the other class.


Everything before we were dealing with amplitudes


In CSP, Information sits in the variance of the frequencies that we kept

Means wouldn't be discrimnative


We want less features space to be able to discriminate between classes.


It's a Data driven spatial fitlering method


In CSP we get a filter matrix W that is simultaneously the diagonizer of both the
covariance matrices of both classes

So that W⊤ Σ1 W = D1 and W⊤ Σ2 W = D2
We also get this property that D1 + D2 = I


CSP steps:

Gather data from two conditions (right vs Left) into their own matrices (channels x epochs x time)
Calculate covariance for each
GEVD and choose Ws (corresponding to eignevalues that fit some criteria)

3 from above and 3 from below (above has high variance in one conditiion and low in other and below is opposite)


Multiply W.T by data (project into the eigenvector space/ CSP space)
Take the log variance of the data in that space
Classify wit LDA

shrinkage is not necessary in this case because the data is low dimensional in this case


Preprocesing Steps:

Determine a suitable frequency band that shows good discrimination
between the conditions. Done by looking at all channels

ERD/ERS: Event-Related (De)Synchronization
We get ERD through

Band pass the desired frequency
Envelope / Hilbert transformation
Average that across trials
You got ur ERD


Using ERD we determine suitable time interval that shows good discrimination between
conditions
We have different options we can use here

Classify on PDS
Classify directly on ERD plots
(spatial-temporal features from ERD time series per channel and use shrinkage LDA)
Band pass filter and apply log variance and cassify with shrinkage LDA


Log Band power features

Plot potentials (if clearly separable they will be like cross)
Take the power
Take the log
BECAUSE THE DATA IS CORRELATED AND THE VARIANCES ARE SIMILAR IN BOTH DIRECTIONS
Calculating band-power features in raw channels like that
would make the mixing of information irreversible for subsequent classification.
We need to do CSP (spatial filtering) beforehand


CROSS VALIDATION


WHY LOG??

More gaussian


It's eigenvalue decomposition at the end of the day.


In order to obtain good band-power features, we need to apply some spatial filtering (eg. CSP)
before calculating log band-power.


For classification, Common Spatial Patterns (CSP) Analysis is a
useful technique. The goal of CSP is to determine spatial filters that optimally
contrast modulations of brain rhythms in two conditions. This way we can ultimately
get the


Questions

"Calculate the signed 𝑟2-values and add them as title of the subplots", the result of each trial has a length corresponding to the
number of datapoint in that trial. How do I get one value to put as title for the subplot? Mean?
Just to make sure, is epoch synonym for trial?
In the last part, it says
"Determine log band-power values for each trial and channel (without CSP filter), calculate the signed 𝑟2 -values for each channel and display the result as topography."
What exactly should be displayed within a topography? The r2-values of the log band-power values? It's a bit unclear and I am not sure hwo to understand it.


Kempter


QUESTIONS

[Lect 2] Measures to quantify spike trains??


Basics

Spikes have ~100mv amplitude and 1ms width
Extracellular is 1000 times smaller
Neural code?

Relation between stimulus and neural response
What part of spike train carries information about the stimulus


Encoding is from stimulus to response
Decoding is getting the stimulus from neural response
Simpliest neural encoding (from stimulus s to neural response p)

Tuning curves


The dirac

1/A for -A/2 to A/2
inf at tau = 0 and 0 otherwise


How do we measure firing rate?

Neural response function
There are three types of averging to apply to the neural response function

temporal average
trial average
neuron average


spike count rate:

is the simple number of spikes over time average


Temporal averaging

Bining (histogram)

Rate has discrete values and depends on placement of bins


Filtering / Kernel

Basically going through every spike and putting the kernel on it and then adding
kernel value if it's centered there.


Rectangular filter or running average which is not super good because it's discrete)
Gaussian filter: is good because it's continous but still not causal
Alpha filter: Causal and continuous because it's not symmetrics
because


Minimum width is T/n


Trial average

The one with the brackets
trial average spiking rate on it's own is not useful
What is useful is trial average + temporal average

Basically we do trial avearage for neuronal spike/dirac function first
and then we do a temporal average using rect. window


More useful because it allows much smaller width of kernel
Special case when we integrate over the whole experiment:

Will be the average number of spikes (total spikes in every trial over number of trials)
divided by the whole time -> Average Firing rate


Drawing a graph of time against rate can give us a good insight

Area under Curve (AUC) would be rate * time = quantity or probability
given that the area is small (<< 1)


Neuron average


Tuning Curves

Simple mapping between stimulus (like orientation angle of a bar in the visual field of monkey)
and the firing rate (as frequency)
Gaussian tuning curve would look like gaussina with f(s) = max_firing_rate * exp(-1/2 * ((s - s_max)/ variance)**2


Spike Triggered Average (STA)

Is the average value of stimulus a time interval tau before a spike is fired
BASICALLY, for a spike happening at time ti we get the value for the stimulus at ti - tau
and then we average for all spikes
We ALSO do an average over all trials
OFCOURSE this is not just done for one tau but multiple continous tau so that we have
an idea of how the whole range 0 < t < tau looks like for the stimulus
Assumption needed is that the stimulus has zero mean (i.e integration over time range is zero)
C(tau) is zero if:

tau is large because of finite memory
tau is smaller than zero (causality and basic time concept haha)
any tau if stimulus is not related to spiking


Can be regarded as correlation between neural response function (or firing rate)
with stimulus

We remove the summation and add an integral over the stimulus multiplied with the neural
function (which is dirac delta)
We also use the fact that trial average of p is trial average of r


C(tau) = Qrs(-tau) * T / <n>
STA is REVERSE correlation


Correlation vs Convolution

In convolution we have -t +- tau if t is the integration variable
In correlation we have +t +- tau if t is the integration variable
Correlation is a good way to see how two functions are similar to each other at a
particular offset tau
So cross-correlation Qxy = 1/T * Integral( x(t) * y(t + tau) )dt
Autocorrelation Qxx = 1/T * Integral( x(t) * x(t + tau))dt


Interpetations of firing rates:

Gives us information about spike train, area under the curve gives us probability of having
a spike in some time (only if interval is small interval << 1).


What is a good stimulus

Should produce spikes (right modality)
Should be close to natural stimulus
Should cover the space of possible range of stimulii
Good compromise is white noise?


White noise stimulus?

Uncorrelated stimulus Qss = sigma**2 * dirac where sigma is the noise variability
The correlation function is proportional to the dirac delta
Numerical analysis:

noise variability / delta_t (sampling rate)
basically the VARIANCE of the white noise is inversly proportional to the sampling rate


Spike-Train Statistics

Probability of a specific spike train is practically impossible to know becasue there's so many possibilities
VERY STRONG assumption: Spikes are statistically independant


Poisson process

The homogenous poisson process provides us with estimate for p(t1, t2 . . tn)
Basic assumption

Probability to have one spike in [t, t+dt] is rate * dt
Prob of no spike is 1 - r*dt


Probability of having n arbitrary spikes in T is
( (r*T)**n / n! ) * exp(-r*T)    -> here the (rT) is the center/mean of the gaussian
An important graph is p(n) against (rT)

This gives us the probability that a hom. poisson process gives us a certain number
of spikes
For example Probability of hom. poisson process giving us:

0 spikes for (rT = 0) is 1
1 spike for (rT = 1) is 0.4
2 spikes for (rT = 2) is 0.3


This graph has decreasing peaks as n gets higher but always peaks on that value


SOOOO the probability of having a specific spike train of length n is
p(t1, t2 . . tn) = P(n) * n! * (dt / T)**n , where
- P(n)   :  is that poisson distribuition for an arbitrary n spikes to exist/happen
- n!     :  is the number of permutations
- dt/T   :  is volume of normalized n-dim
From the dist. of spike count we can see that the mean is rt
The variance would be <n**2> - <n>**2 which is also equal rt
Thus the variance and mean of the spike count are equal.
Fano factor: The ratio of variance to mean and it takes value ONE
for a homogeneous Poisson process, independent of the time interval T.


Homo vs inhomog. Poisson process

Homogenous has firing rate being consistent
Inhomg. has inconsistent firing rate


Interspike-Interval Distribution

We count on Poisson to know the probability of n spikes to happen
After calculating we find P_isi = r * exp ( -r * t )
mean: 1 / r
variance: 1 / r2
Coeffecient of variation (std_deviation / mean): 1
Poisson doesn't capture the refractory period so WE ADD IT ARTIFICALLY IN STOCHASTIC WAY
Coeffecient of variation (CV) measures how regular the interspike interval is

High CV -> Quite irregular
Zero CV -> Perfectly regular


Spike-Train Autocorrelation

Important to CHARACTERIZE spike trains and to CLASSIFY neuron types
Generalizes the ISI distribution P_ISI, which relates pairs of SUCCESSIVE SPIKES
Quantifies the relation between spikes at an arbitrary offset tau
Autocorrelation of spike train
Q_pp(tau) = 1/T * integral( (p(t) * p(t + tau)) - r**2 )dt
There's ofc a numerical calculation method for this which is intuitive


How do we estimate firing rate?

Using the neural encoding ansatz

The ansatz linear
No adaptation, not saturation and possible negative firing rate


Saturation: Can't go past a firing rate


Adaptation: Tired of firing because neuron get used to it.


Entropy


Correlation in true neurons (vs the ones in pure Poisson distribuion)
causes true neurons to have less entropy (because we have more information)