sooheon/swiper bug example doc

## swiper bug example doc
* Before exam
Here is what you should do. The night before do not eat anything spicy (for obvious reasons). If you think you won't be able to sleep pop a sleep aid at like 8:30 - 9:00pm. Sleep is crucial for your ability to think clearly. Set your alarm before the exam so you can get in a workout. This is absolutely mandatory. You need to sweat out all your nervous energy. You need to give yourself a solid 30-60 minute cardio session. You can walk the entire time if you are not accustomed to exercising. Then eat a big ass breakfast. Bacon, eggs, potatoes, tomato, and avocado. Fuel up. It is a long test. Once you are taking the test if you do not know how to solve the problem after the initial read through mark it and move forward. Kill all the questions you are sure about the answer then revert back to the ones that you were unsure about. There are 5 pilot questions. The magic number is historically 21 correct to pass. I took the test in March of 2016. Passed. I was an earned level of 8.98 on Adapt and felt great taking P. The wording comment is accurate. When you are unsure of a the wording of a particular problem take a deep breath, close your eyes, and say to yourself, what do I know and what is the goal? Definitions are critical. Be confident in your abilities. Watch Cool Runnings, especially the scene where the two dudes are staring the in the mirror in the bathroom of the bar: https://www.youtube.com/watch?v=I7_T3uUZSE4 Own your future.

* Poisson Process
Say you're a traffic engineer.

X = # of cars that pass in 1 hr

Assumptions
1. Any hour is no different from any other hour.
2. No hour is correlated to another.

Let's say through observation, E(X) = \lambda

Formulated as a binomial distribution where each minute of the hour a car can
either pass or not pass:

$\lambda \text{ cars/hr} = 60 \text{ min/hr } \frac{\lambda}{60} \text{ cars/min}$

But two cars could pass within the same minute, or second, or half second...
So find the limit of this binomial distribution, making infinite observations
at infinitesimally small intervals. This is the /Poisson distribution/.

$$P(X=k)=\binom{n}{k} {\frac{\lambda}{3600}}^k (1 - \frac{\lambda}{3600})^{3600-k}$$

$$P(X=k)=\lim_{n\to \infty} \binom{n}{k} (\frac{\lambda}{n})^k (1 - \frac{\lambda}{n})^{n-k}$$

$$= \frac{\lambda^k}{k!} e^{-\lambda}$$

** Limits
$$\lim_{x\to\infty} (1 + \frac{a}{x})^x = e^a$$

Substituting:
$$\frac{1}{n} = \frac{a}{x}$$
$$x = n^a$$

$$\lim_{x\to\infty} (1 + \frac{1}{n})^{na} = \lim_{x\to\infty} ((1 + \frac{1}{n})^n)^a$$
$$= (\lim_{x\to\infty} (1 + \frac{1}{n})^n)^a = e^a$$

** Also,

$\frac{x!}{(x-k)!}=(x)(x-1)(x-2)...(x-k+1)$

In other words, $\frac{7!}{5!} = 7*6$

* General Probability

** Set Theory

*** Law of Total Probability

n disjoint events A_1, A_2 ... A_n where $Pr(A_1) + Pr(A_2) ... + Pr(A_n) = 1$ are called a /partition/ of the sample space. If event B is in this sample space, Pr(B) is the sum of all of the intersections between B and each A.

$Pr(B) = Pr(B \cap A_1) + Pr(B \cap A_2) + ... Pr(B \cap A_n) = \sum_{i=1}^n
Pr(B\cap A_i)$

In a special case, when n = 2:

$Pr(B) = Pr(B \cap A) + Pr(B \cap A')$

*** De Morgan's Laws
1. The complement of the /union/ of /n/ events is the /intersection/ of the
   complements of the events.
2. The complement of the /union/ of /n/ events is the /intersection/ of the
   complements of the events.

** Conditional Probability
$Pr(A|B) = \frac{Pr(A\cap B)}{Pr(B)}$ when Pr(B) \neq 0

An alternate form is:

$Pr(A\cap B) = Pr(A|B)\cdot Pr(B)$

Equivalently:

$Pr(B\cap A) = Pr(B|A)\cdot Pr(A)$

And because B\cap A = A\cap B:

$Pr(A|B)\cdot Pr(B) = Pr(B|A)\cdot Pr(A)$

** Bayes' Theorem

** Counting and Combinatorial Probability

* Univariate Probability Distributions

** Discrete Random Variables
Countable (or countably infinite) things: how many cars will the be in the
parking lot? How many 35yr old males will die of cancer in 1 year?

*** Probability Mass Function
The /mass/ of probability at a given value of the random variable X = x. In other
words, the Pr(X = x), or $p_X(x)$.

Satisfies following conditions:

$0 \le p_X(x) \le 1$

$\sum_{\text{all }x} p_X(x) = 1$

*** Cumulative Distribution Function
$F_X(x)$

Cumulative probability to the left of and including x.

$F_X(x) = Pr(X \le x) = \sum_{i\le x} Pr(X = i)$

Some properties:

1. It is non-decreasing.
2. Regardless of probability distribution, $F_X(-\infty) = 0$, and $F_X(\infty) = 1$
3. When discrete, $p_X(x_a) = F_X(x_a) - F_X(x_{a-1})$
4. When continuous, $F_X(x) = \int_{-\infty}^x f_X(s) ds$, $f_X(x) = \frac{d}{dx} F_X(x)$

** Continuous Random Variables
A real number in range[0,100], such a 11.23 or 56.64. Amount of rainfall in
cm.

*** Probability Density Function
Denoted $f_X(x)$

Equivalent of PMF. Mnemonic: PMF measures the probabilistic mass of one atomic value of the variable. What is the probability associated with the value 2. Or 10. PDF is used to measure the /density/ of probabilities associated with a range of values. What is the integral of the PDF when 0 < X < 110?

Pr(X = a) = 0, because trying to find the probability that X = a is trying to find the probability of one single value out of an infinite number of values, which is 0.

$0 \le f_X(x)$

PDF has to be positive, but can be greater than 1 at individual points, as
long as the integral over the whole domain is 1.

$\int_{-\infty}^{\infty} f_X(x) dx = 1$

** Moments

*** First Moment, Expected Value

**** Density Function Method
For discrete random variables:

$$E[g(X)] = \sum_{\text{all }x} g(x) \cdot Pr(X=x)$$

For continuous random variables:

$$E[g(X)] = \int_{-\infty}^\infty g(x)\cdot f_X(x) dx$$

**** Survival Function Method
When X is discrete, and defined for non-neg integers (x = 0,1,2...):
$$E[X]=\sum_{x=0}^\infty Pr(X>x)$$

X is continuous, and defined for non-neg values:
$$E[X] = \int_0^\infty Pr(X>x)dx = \int_0^\infty S_X(x) dx$$

Generalizing to g(x) where g(0) = 0, for both discrete and continuous random variables:
$$E[g(X)] = \int_0^\infty g'(x)\cdot S_X(x) dx$$

The above equations have x defined for 0 to \infty. What if a<=X<=b?

$$E[X]=\sum_{x=0}^\infty Pr(X>x)$$
$$= \sum_{x=0}^{a-1} Pr(X>x) + \sum_{x=a}^b Pr(X>x)$$

$$= (1 + 1 + ... 1) + \sum_{x=a}^b Pr(X>x) = a + \sum_{x=a}^b Pr(X>x)$$

**** Properties of Expected Value
1. E(c) = c
2. E[c*g(X)] = c*E(g(X))
3. E(g(X) + f(X) + h(X)) = E(g(X)) + E(f(X)) + E(h(X))

These are essentially the rules for sum notation, as E(X) is a sum notation at heart. For all x, sum g(x)*Pr(X=x).

**** Conditional Expectation
Expected value of X within some subset of Xs. For a discrete random variable:
$$E[X|j\le X \le k] = \sum_{x=j}^k x\cdot Pr(X=x|j\le X \le k)$$

For continuous:
$$E[X|j\le X \le k] = \int_j^k x\cdot f_{X|j\le X\le k}(x) dx$$

Where:
$$f_{X|j\le X\le k}(x) dx = \frac{f_X(x)}{Pr(j\le x \le k} $$

But remember that $Pr(j\le X \le k)$ is just a constant.

$$E[X|j\le X \le k] = \sum_{x=j}^k x\cdot \frac{Pr([X=x]\cap [j\le X\le k])}{Pr(j\le X\le k)}$$
$$=\frac{\sum_{x=j}^k x\cdot Pr(X=x)}{Pr(j\le X\le k)}$$

For continuous:
$$E[X|j\le X\le k]=\int_{j}^k x\cdot f_{X|j\le X\le k}(x)dx$$
$$=\frac{\int_j^k x\cdot f_X(x) dx}{Pr(j\le X\le k)}$$

A conditional expected value is the usual definition of an expected value, summed/integrated over the conditional range, divided by the probability of X being in that range. This also applies to the general case of g(X).

**** Moments of Mixed Distributions
Say Y is a mixed distribution of discrete X and continuous Z. The moment (expected value) of the mixed distribution Y is:

$$E[Y^n] = c_1 E[X^n] + c_2 E[Z^n]$$

If A is the event that Y is discrete, and B is the event Y is continuous, and Y is discrete at points a and b, while continuous from a to b:

X = Y|A
Z = Y|B
Pr(A) = c_1
Pr(B) = c_2

The above equation is rewritten using conditional expectation equations:

$$E[Y^N] = \Pr(A)(\frac{\sum_{y=a}^b y^n \cdot \Pr(Y=y)}{\Pr(A)}) + \Pr(B)(\frac{\int_a^b y^n \cdot f_Y(y) dy}{\Pr(B)})$$

*** Variance
A /raw/ moment is derived from the value of X itself, while a /central/ moment is derived from the difference between X and its mean. The kth raw moment of X is E[X^k].

The kth central moment of X is E[(X - \mu)^k]

$$Var(X) = E[(X - \mu)^2] = \sigma^2$$
$$= E[X^2] - (E[X])^2 = E[X^2] - \mu^2$$

In general, for g(X):
$$Var[g(X)] = E[(g(X) - E[g(X)])^2] = E[g(X)^2] - (E[g(X)])^2$$

**** Important Properties of Variance
1. Var[c] = 0

2. $$Var[aX + b] = a^2 \cdot Var[X] + Var[b] = a^2 \cdot Var[x]$$

3. $$\sigma = SD[X] = \sqrt{Var[X]}$$

**** Variance of Mixed Distributions
Variance of a mixed distribution is not the weighted average of the distributions.

$$Var[Y] \ne c_1 Var[X_1] + c_2 Var[X_2]$$

Find the first and second moments of the entire mixed distribution first, and find the variance:

$$Var[Y] = (c_1 E[X^2_1] + c_2E[X^2_2]) - (c_1 E[X_1] + c_2 E[X_2])^2$$

**** Coefficient of Variation
CV[X] is the ratio of the standard deviation to the mean.

$$CV[X] = \frac{\sigma}{\mu} = \frac{SD[X]}{E[X]}$$

It's a measure of variability relative to the mean.

** Moment Generating Functions
MGF is a generalized function of a random variable that can generate its nth moments.

$$M_X(t) = E[e^{tX}]$$

For discrete random variables:
$$=\sum_{\text{all } x} e^{tX} \cdot Pr(X=x)$$

For continuous random variables:
$$= \int_{-\infty}^\infty e^{tX} \cdot f_X(x)dx$$

Joint Moment Generating Function:
$$M_{X,Y}(s,t)=E[e^{t(X+Y)}]$$

If you substitute t for s, this is also the moment generating function for X+Y:

$$M_{X,Y}(t)=E[e^{t(X+Y)}]$$

*** Important Properties of MGF
1. M_X(0) = 1
   The 0th moment of any random variable, continuous or discrete, is 1

2. $$M_{aX+b}(t) = e^{bt}\cdot M_X(at)$$
   Scaling and shifting random variable X:
   $$M_{aX+b}(t) = E[e^{t(aX+b)}]$$
   $$= E[e^{(at)X}\cdot e^{tb}]$$
   $$= e^tb\cdot E[e^{atX}]$$
   $$= e^tb\cdot M_X(at)$$

3. $$M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$$, if X and Y are independent
   The MGF of a sum of random variables is the product of the MGFs of each.
   $$M_{X+Y}(t) = E[e^{t(X+Y)}]$$
   $$= E[e^{tX+tY}]$$
   $$= E[e^{tX}\cdot e^{tY}]$$
   $$= E[e^{tX}]\cdot E[e^{tY}]$$

4. The nth derivative of the MGF of X at t=0 is the nth moment of X.
   $$\left. \frac{d^n}{dt^n} M_X(t) \right \rvert_{t=0} = E[X^n]$$

   $$M'_X(t) = \frac{d}{dt} M_X(t)$$
   $$= \frac{d}{dt} E[e^{tX}]$$
   $$= E[\frac{d}{dt} e^{tX}]$$
   $$= E[X\cdot e^{tX}]$$
   $$M'_X(0) = E[X]$$

   $$M''_X(t) = \frac{d^2}{dt^2} M_X(t)$$
   $$= E[\frac{d}{dt}(X\cdot e^{tX})]$$
   $$= E[X^2\cdot e^{tX}]$$

** Percentiles, Mode, Skewness, and Kurtosis

*** Percentile and Quartile
The 100p^{th} percentile (\pi_p) for 0<=p<=1. The value where F_X(\pi_p) = p (or greater, if variable is discrete).

Interquartile range = \pi_0.75 - \pi_0.75

*** Mode
The set of values of the random variable which appear most often at which points the PDF or PMF of that variable is highest (is most likely to appear).

Solve for the derivative of the PDF to equal 0: $$\frac{d}{dx}f_X(x) = 0$$

To double check for multiple modes, the second derivative at x should be negative to indicate local maxima (the slope is decreasing).

*** Skewness and Kurtosis
Skewness: measure of the asymmetrical property of a distribution. Perfect symmetry is 0 skew. Positive skew is skewed to the right (not that the mode is to the right, but that the tail extends longer to the right). Negative skew means left tail is stretched.

$$\text{Skewness} = \frac{E[(X - \mu)^3]}{\sigma^3}$$

Skewness is the ratio of the third central moment to the third power of the standard deviation.

Kurtosis measures the "peakedness" of a distribution. Higher kurtosis means a sharper peak.

$$\text{Kurtosis} = \frac{E[(X-\mu)^4]}{\sigma^4}$$

Kurtosis is the ratio of the 4th central moment to the 4th power of the standard deviation.

** Chebyshev's Inequality
For any probability distribution, at most $$\frac{1}{k^2}$$ of the values can be more than k standard deviations away from the mean.

Given random variable X, with finite E(X) = \mu, finite non-zero \sigma^2:
For any real number k|k>0, $$\Pr(|X-\mu|\geq k\sigma)\leq \frac{1}{k^2}$$

Simplified:
$$\Pr(X-\mu \geq k\sigma) + \Pr(X-\mu \leq -k\sigma) \leq \frac{1}{k^2}$$
$$\Pr(X \geq \mu + k\sigma) + \Pr(X\leq \mu - k\sigma) \leq \frac{1}{k^2}$$

Of course, when k=1, this inequality is self evident. No probability is greater than 1. When 0<k<1, this inequality doesn't hold because \frac{1}{k^2} becomes greater than one.

Alternate form:
$$Pr(|X-\mu|\geq k)\leq \frac{\sigma^2}{k^2}$$

** Univariate Transformations
Transformation: using the distribution function of one RV to determine the distribution function of another RV.

*** CDF Method
1. PDF of X -> integrate
2. CDF of X
3. Transformation (algebra + substitution)
4. CDF of Y -> differentiate
5. PDF of Y

*** PDF Method ("shortcut")
Go directly from PDF to PDF.

$$f_Y(y)=f_X[g^{-1}(y)]\cdot |\frac{d}{dy}g^{-1}(y)|$$

* Discrete Distributions
** Discrete Uniform Distribution
Picking marbles labeled 1 to N out of a jar.

Mass function: $$Pr[X = x] = \frac{1}{N}$$ for n = 1,2,...N

Mean: $$\mu=E(X)=\frac{N+1}{2}$$

Variance: $$\sigma^{2}=Var(X)=\frac{N^2-1}{12}$$

Moment generating function: $$M_X(t)=\frac{e^t(1-e^{NT})}{N(1-e^t)}$$

*** Deriving the MGT for Discrete Uniform
Remember the formula for the sum of a geometric sequence:
$$\sum_a^b e^{tX} = e^{at}+e^{(a+1)t}+...+e^{bt}$$
$$=\frac{\text{first term}-\text{first ommitted term}}{1-\text{common ratio}}$$
$$=\frac{e^{at}-e^{(b+1)t}}{1-e^t}$$

In the common case: $$M_X(t) = \frac{e^t - e^{(n+1)t}}{1-e^t}$$

** Bernoulli Distribution
- Two values: 0 or 1

Parameter: probability of getting value 1, p

*** Formulas
Pr[X=x] = p, x=1; q, x=0

E(X) = p

$$E(X^2) = 0^2*q + 1^2*p$$

$$Var(X) = p-p^2 = p(1-p) = pq$$

$$M_X(t)=qe^{t(0)} + pe^{t(1)}= pe^t+q$$

*** Bernoulli shortcut
Given Y, a random variable with 2 discrete possible values a and b with probabilities p and q:

$$Var[Y] = Var[(a-b)X + b] = (a-b)^2 pq$$

** Binomial Distribution
Number of successes in N number of Bernoulli trials.

n = number of trials
p = probability of success on each trial

$$X\sim Bin(n,p)$$

*** Formulas
$$Pr[X=x] = {n\choose x} p^xq^{n-x}, x=0,1,2,...,n$$
$$E(X)=np$$
$$Var(X)=npq$$
$$M_X(t)=(pe^t+q)$$

** Hypergeometric Distribution
Like a binomial distribution, but /with replacement/. (can there also be more than 2 groups to choose from? probably. to be confirmed)

*** Formulas
Probability of choosing x of m successes from n out of N choices:

$$\Pr(X=x)=\frac{{m\choose x}{{N-m}\choose{n-x}}}{{N\choose n}}$$

** Geometric Distribution
Number of Bernoulli trials necessary to obtain first success.

p = success of individual Bernoulli trial

*** Formulas
X models number of trials needed to obtain first success.

$$Pr[X=x]=q^{x-1}\cdot p$$ for x=1,2,3...

$$\mu = E[X] = \frac{1}{p}$$

$$\sigma^2=Var[X]=\frac{q}{p^2}$$

$$M_X(t)=\frac{pe^t}{1-qe^t}$$

*** Special Properties
- Memoryless:
  If k failures have already happened, the number of expected trials remaining is still the same.

*** Alternative Forms
Y models number of failures *prior* to the first success (Y=X-1)

$$Pr[Y=y]=q^y\cdot p$$ for y = 0,1,2...

$$\mu=E[y]=E[X-1]=E[X]-1 = \frac{1}{p}-1 = \frac{q}{p}$$

$$\sigma^2 = Var[Y]=Var[X-1]=Var[X]=\frac{q}{p^2}$$

** Negative Binomial Distribution
X = the number of Bernoulli trials until a number of successes occur.

p = probability of success each trial
r = number of desired successes

*** Formulas
Probability of rth success on xth trial:
$$Pr(X=x)={{x-1}\choose {r-1}}p^r\cdot (1-p)^{x-r}$$ x >= r

$$E[X]=E[N_1+N_2+N_3+...N_r]$$
$$=r\cdot E[N] = r\cdot \frac{1}{p}$$ for 0<p<1, r>0

$$Var[X] = r\cdot Var[N] = r\cdot \frac{1-p}{p^2}$$ for 0<p<1, r>0

$$M_X(t) = M_{N_1+N_2+N_3...+N_r}(t) = E[e^{t(N_1...N_r)}]$$
$$= [M_N(t)]^{r} = \left (\frac{pe^t}{1-(1-p)e^t} \right )^r$$

** Poisson Distribution
X = the occurrences of an event in a fixed time interval.

\lambda = E[X]

*** Formulas
$$Pr(X=x)=\frac{e^{-\lambda}\cdot \lambda^x}{x!}$$

$$Var[X] = \lambda$$

$$M_X(t)=E[e^{tX}]=\sum_0^{\infty} e^{tx}\cdot \frac{e^{-\lambda}\cdot \lambda^x}{x!}$$
$$=e^\lambda \cdot \sum_0^\infty \frac{(\lambda e^t)^x}{x!}$$
$$M_X(t)=e^{\lambda(e^t-1)}$$

*** Special Properties
1. Sum of independent Poisson random variables is Poisson
   if X and Y are Poisson with \lambda_1 and \lambda_2, (X+Y) is Poisson with mean \lambda_1 + \lambda_2

* Continuous Distributions

** Continuous Uniform
A constant density function between some interval.

[a,b] = interval of distribution

*** Formulas
$$f_X(x) = \frac{1}{b-a}$$ for a<=x<=b

Length of desired area / length of domain:
$$F_X(x) = \frac{x-a}{b-a}$$, a<=x<=b

$$\mu=E[X] = \frac{a+b}{2}$$

$$\sigma^2 = Var[X] = \frac{(b-a)^2}{12}$$

$$M_X(t) = \frac{e^{tb}-e^{ta}}{t(b-a)}$$

Because t is in the denominator, it must be evaluated using L'hospital's rule at 0.

** Exponential Distribution
Measures the length of time until an event occurs

\lambda = a constant hazard rate

*** Formulas
$$f_X(x) = \lambda e^{-\lambda x}$$ x>=0

$$F_X(x) = 1-e^{-\lambda x}$$

$$S_X(x) = e^{-\lambda x}$$

$$\mu = E[X] = \frac{1}{\lambda}$$

$$\sigma^2 = \frac{1}{\lambda^2}$$

$$M_X(t) = \frac{\lambda}{\lambda - t}$$ t < \lambda

*** Special Properties
It is similar to the geometric distribution.

- Memoryless
  Even if you've waited k, the remaining waiting time X-k is still exponential, with the same mean. Waiting does not mean your remaining wait time is reduced.
  $$E[X-k|X>k] = E[X] = \frac{1}{\lambda}$$

** Gamma Distribution
http://tinyurl.com/zhogckx

X>=0, \lambda = hazard function per iid exponential distribution, and \alpha = number of iid exponential distributions, both > 0

$$\Gamma(\alpha) = \int_0^\infty x^{\alpha-1}e^{-x}dx$$
$$\Gamma(\alpha) = \int_0^\infty x^{\alpha}e^{-x}\frac{dx}{x}$$

Unless \alpha is a whole number, this integration will be difficult.

$$\Gamma(\alpha+1) = \alpha\Gamma(\alpha)$$ when \alpha>1

$$\Gamma(\frac{1}{2}) = \sqrt_{\pi}$$

*** Formulas
$$f_X(x) = \frac{\lambda^a}{\Gamma(a)}\cdot x^{\alpha-1}e^{-\lambda x}$$ for x>=0
$$f_X(x) = \frac{1}{\Gamma(a)\theta^a}\cdot x^{\alpha-1}e^{-\frac{x}{\theta}}$$ for x>=0

$$\mu = E[X] = \frac{\alpha}{\lambda}$$

$$\sigma^2 = Var[X] = \frac{\sigma}{\lambda^{2}}$$

$$M_X(t) = \left (\frac{\lambda}{\lambda-t} \right)^{\alpha}$$ t<\lambda

The CDF of X is 1 - (sum of the PMF of a Poisson(\lambda = x/\theta)), with the number of terms in sum being \alpha.

$$Y\sim Poisson(\lambda = \frac{x}{\theta})$$

$$F_X(x) = \Pr(Y\geq y) = 1-\Pr(Y<y) = 1 - \Pr(Y\leq y-1)$$

*** Special Properties
When \alpha=1, the Gamma distribution = the exponential distribution.

For other whole numbers, the Gamma distribution is the sum of \alpha iid exponential variables.

$$\Gamma(\alpha) = (\alpha-1)!$$

*** Alternate forms
$$\lambda = \frac{1}{\theta}$$

** Normal Distribution
\mu and \sigma^2

*** Formulas
$$f_X(x) = \frac{1}{\sigma \sqrt_{2\pi}} \ e^{\frac{(x-\mu)^2}{2\sigma^2}}$$

\mu and \sigma^2 are given

$$M_X(t) = e^{(\mu t + \frac{\sigma^2 t^2}{2})}$$

*** Special Properties
If X is normal, $$Z = \frac{X-\mu}{\sigma}$$ is also normal, with \mu = 0 and \sigma^2 = 1.

X is symmetric around \mu, and Z is symmetric around 0.

The sum of independent normal random variables is also normal.

* Central Limit Theorem
Given X_1, X_2, X_3 i.i.d. random variables, all with same \mu and \sigma.

Define S_n as sum of the first n random variables, and Z_n as its Z-score.
$$S_n = \sum_1^n X_i$$, and $$Z_n = \frac{S_n/n - \mu}{\sigma/\sqrt_n}$$
$$= \frac{S_n - n\mu}{\sigma \sqrt_n}$$

u_Z = 0, \sigma = 1
	* Before exam
	Here is what you should do. The night before do not eat anything spicy (for obvious reasons). If you think you won't be able to sleep pop a sleep aid at like 8:30 - 9:00pm. Sleep is crucial for your ability to think clearly. Set your alarm before the exam so you can get in a workout. This is absolutely mandatory. You need to sweat out all your nervous energy. You need to give yourself a solid 30-60 minute cardio session. You can walk the entire time if you are not accustomed to exercising. Then eat a big ass breakfast. Bacon, eggs, potatoes, tomato, and avocado. Fuel up. It is a long test. Once you are taking the test if you do not know how to solve the problem after the initial read through mark it and move forward. Kill all the questions you are sure about the answer then revert back to the ones that you were unsure about. There are 5 pilot questions. The magic number is historically 21 correct to pass. I took the test in March of 2016. Passed. I was an earned level of 8.98 on Adapt and felt great taking P. The wording comment is accurate. When you are unsure of a the wording of a particular problem take a deep breath, close your eyes, and say to yourself, what do I know and what is the goal? Definitions are critical. Be confident in your abilities. Watch Cool Runnings, especially the scene where the two dudes are staring the in the mirror in the bathroom of the bar: https://www.youtube.com/watch?v=I7_T3uUZSE4 Own your future.

	* Poisson Process
	Say you're a traffic engineer.

	X = # of cars that pass in 1 hr

	Assumptions
	1. Any hour is no different from any other hour.
	2. No hour is correlated to another.

	Let's say through observation, E(X) = \lambda

	Formulated as a binomial distribution where each minute of the hour a car can
	either pass or not pass:

	$\lambda \text{ cars/hr} = 60 \text{ min/hr } \frac{\lambda}{60} \text{ cars/min}$

	But two cars could pass within the same minute, or second, or half second...
	So find the limit of this binomial distribution, making infinite observations
	at infinitesimally small intervals. This is the /Poisson distribution/.

	$$P(X=k)=\binom{n}{k} {\frac{\lambda}{3600}}^k (1 - \frac{\lambda}{3600})^{3600-k}$$

	$$P(X=k)=\lim_{n\to \infty} \binom{n}{k} (\frac{\lambda}{n})^k (1 - \frac{\lambda}{n})^{n-k}$$

	$$= \frac{\lambda^k}{k!} e^{-\lambda}$$

	** Limits
	$$\lim_{x\to\infty} (1 + \frac{a}{x})^x = e^a$$

	Substituting:
	$$\frac{1}{n} = \frac{a}{x}$$
	$$x = n^a$$

	$$\lim_{x\to\infty} (1 + \frac{1}{n})^{na} = \lim_{x\to\infty} ((1 + \frac{1}{n})^n)^a$$
	$$= (\lim_{x\to\infty} (1 + \frac{1}{n})^n)^a = e^a$$

	** Also,

	$\frac{x!}{(x-k)!}=(x)(x-1)(x-2)...(x-k+1)$

	In other words, $\frac{7!}{5!} = 7*6$

	* General Probability

	** Set Theory

	*** Law of Total Probability

	n disjoint events A_1, A_2 ... A_n where $Pr(A_1) + Pr(A_2) ... + Pr(A_n) = 1$ are called a /partition/ of the sample space. If event B is in this sample space, Pr(B) is the sum of all of the intersections between B and each A.

	$Pr(B) = Pr(B \cap A_1) + Pr(B \cap A_2) + ... Pr(B \cap A_n) = \sum_{i=1}^n
	Pr(B\cap A_i)$

	In a special case, when n = 2:

	$Pr(B) = Pr(B \cap A) + Pr(B \cap A')$

	*** De Morgan's Laws
	1. The complement of the /union/ of /n/ events is the /intersection/ of the
	complements of the events.
	2. The complement of the /union/ of /n/ events is the /intersection/ of the
	complements of the events.

	** Conditional Probability
	$Pr(A\|B) = \frac{Pr(A\cap B)}{Pr(B)}$ when Pr(B) \neq 0

	An alternate form is:

	$Pr(A\cap B) = Pr(A\|B)\cdot Pr(B)$

	Equivalently:

	$Pr(B\cap A) = Pr(B\|A)\cdot Pr(A)$

	And because B\cap A = A\cap B:

	$Pr(A\|B)\cdot Pr(B) = Pr(B\|A)\cdot Pr(A)$

	** Bayes' Theorem

	** Counting and Combinatorial Probability

	* Univariate Probability Distributions

	** Discrete Random Variables
	Countable (or countably infinite) things: how many cars will the be in the
	parking lot? How many 35yr old males will die of cancer in 1 year?

	*** Probability Mass Function
	The /mass/ of probability at a given value of the random variable X = x. In other
	words, the Pr(X = x), or $p_X(x)$.

	Satisfies following conditions:

	$0 \le p_X(x) \le 1$

	$\sum_{\text{all }x} p_X(x) = 1$

	*** Cumulative Distribution Function
	$F_X(x)$

	Cumulative probability to the left of and including x.

	$F_X(x) = Pr(X \le x) = \sum_{i\le x} Pr(X = i)$

	Some properties:

	1. It is non-decreasing.
	2. Regardless of probability distribution, $F_X(-\infty) = 0$, and $F_X(\infty) = 1$
	3. When discrete, $p_X(x_a) = F_X(x_a) - F_X(x_{a-1})$
	4. When continuous, $F_X(x) = \int_{-\infty}^x f_X(s) ds$, $f_X(x) = \frac{d}{dx} F_X(x)$

	** Continuous Random Variables
	A real number in range[0,100], such a 11.23 or 56.64. Amount of rainfall in
	cm.

	*** Probability Density Function
	Denoted $f_X(x)$

	Equivalent of PMF. Mnemonic: PMF measures the probabilistic mass of one atomic value of the variable. What is the probability associated with the value 2. Or 10. PDF is used to measure the /density/ of probabilities associated with a range of values. What is the integral of the PDF when 0 < X < 110?

	Pr(X = a) = 0, because trying to find the probability that X = a is trying to find the probability of one single value out of an infinite number of values, which is 0.

	$0 \le f_X(x)$

	PDF has to be positive, but can be greater than 1 at individual points, as
	long as the integral over the whole domain is 1.

	$\int_{-\infty}^{\infty} f_X(x) dx = 1$

	** Moments

	*** First Moment, Expected Value

	**** Density Function Method
	For discrete random variables:

	$$E[g(X)] = \sum_{\text{all }x} g(x) \cdot Pr(X=x)$$

	For continuous random variables:

	$$E[g(X)] = \int_{-\infty}^\infty g(x)\cdot f_X(x) dx$$

	**** Survival Function Method
	When X is discrete, and defined for non-neg integers (x = 0,1,2...):
	$$E[X]=\sum_{x=0}^\infty Pr(X>x)$$

	X is continuous, and defined for non-neg values:
	$$E[X] = \int_0^\infty Pr(X>x)dx = \int_0^\infty S_X(x) dx$$

	Generalizing to g(x) where g(0) = 0, for both discrete and continuous random variables:
	$$E[g(X)] = \int_0^\infty g'(x)\cdot S_X(x) dx$$

	The above equations have x defined for 0 to \infty. What if a<=X<=b?

	$$E[X]=\sum_{x=0}^\infty Pr(X>x)$$
	$$= \sum_{x=0}^{a-1} Pr(X>x) + \sum_{x=a}^b Pr(X>x)$$

	$$= (1 + 1 + ... 1) + \sum_{x=a}^b Pr(X>x) = a + \sum_{x=a}^b Pr(X>x)$$

	**** Properties of Expected Value
	1. E(c) = c
	2. E[cg(X)] = cE(g(X))
	3. E(g(X) + f(X) + h(X)) = E(g(X)) + E(f(X)) + E(h(X))

	These are essentially the rules for sum notation, as E(X) is a sum notation at heart. For all x, sum g(x)*Pr(X=x).

	**** Conditional Expectation
	Expected value of X within some subset of Xs. For a discrete random variable:
	$$E[X\|j\le X \le k] = \sum_{x=j}^k x\cdot Pr(X=x\|j\le X \le k)$$

	For continuous:
	$$E[X\|j\le X \le k] = \int_j^k x\cdot f_{X\|j\le X\le k}(x) dx$$

	Where:
	$$f_{X\|j\le X\le k}(x) dx = \frac{f_X(x)}{Pr(j\le x \le k} $$

	But remember that $Pr(j\le X \le k)$ is just a constant.

	$$E[X\|j\le X \le k] = \sum_{x=j}^k x\cdot \frac{Pr([X=x]\cap [j\le X\le k])}{Pr(j\le X\le k)}$$
	$$=\frac{\sum_{x=j}^k x\cdot Pr(X=x)}{Pr(j\le X\le k)}$$

	For continuous:
	$$E[X\|j\le X\le k]=\int_{j}^k x\cdot f_{X\|j\le X\le k}(x)dx$$
	$$=\frac{\int_j^k x\cdot f_X(x) dx}{Pr(j\le X\le k)}$$

	A conditional expected value is the usual definition of an expected value, summed/integrated over the conditional range, divided by the probability of X being in that range. This also applies to the general case of g(X).

	**** Moments of Mixed Distributions
	Say Y is a mixed distribution of discrete X and continuous Z. The moment (expected value) of the mixed distribution Y is:

	$$E[Y^n] = c_1 E[X^n] + c_2 E[Z^n]$$

	If A is the event that Y is discrete, and B is the event Y is continuous, and Y is discrete at points a and b, while continuous from a to b:

	X = Y\|A
	Z = Y\|B
	Pr(A) = c_1
	Pr(B) = c_2

	The above equation is rewritten using conditional expectation equations:

	$$E[Y^N] = \Pr(A)(\frac{\sum_{y=a}^b y^n \cdot \Pr(Y=y)}{\Pr(A)}) + \Pr(B)(\frac{\int_a^b y^n \cdot f_Y(y) dy}{\Pr(B)})$$

	*** Variance
	A /raw/ moment is derived from the value of X itself, while a /central/ moment is derived from the difference between X and its mean. The kth raw moment of X is E[X^k].

	The kth central moment of X is E[(X - \mu)^k]

	$$Var(X) = E[(X - \mu)^2] = \sigma^2$$
	$$= E[X^2] - (E[X])^2 = E[X^2] - \mu^2$$

	In general, for g(X):
	$$Var[g(X)] = E[(g(X) - E[g(X)])^2] = E[g(X)^2] - (E[g(X)])^2$$

	**** Important Properties of Variance
	1. Var[c] = 0

	2. $$Var[aX + b] = a^2 \cdot Var[X] + Var[b] = a^2 \cdot Var[x]$$

	3. $$\sigma = SD[X] = \sqrt{Var[X]}$$

	**** Variance of Mixed Distributions
	Variance of a mixed distribution is not the weighted average of the distributions.

	$$Var[Y] \ne c_1 Var[X_1] + c_2 Var[X_2]$$

	Find the first and second moments of the entire mixed distribution first, and find the variance:

	$$Var[Y] = (c_1 E[X^2_1] + c_2E[X^2_2]) - (c_1 E[X_1] + c_2 E[X_2])^2$$

	**** Coefficient of Variation
	CV[X] is the ratio of the standard deviation to the mean.

	$$CV[X] = \frac{\sigma}{\mu} = \frac{SD[X]}{E[X]}$$

	It's a measure of variability relative to the mean.

	** Moment Generating Functions
	MGF is a generalized function of a random variable that can generate its nth moments.

	$$M_X(t) = E[e^{tX}]$$

	For discrete random variables:
	$$=\sum_{\text{all } x} e^{tX} \cdot Pr(X=x)$$

	For continuous random variables:
	$$= \int_{-\infty}^\infty e^{tX} \cdot f_X(x)dx$$

	Joint Moment Generating Function:
	$$M_{X,Y}(s,t)=E[e^{t(X+Y)}]$$

	If you substitute t for s, this is also the moment generating function for X+Y:

	$$M_{X,Y}(t)=E[e^{t(X+Y)}]$$

	*** Important Properties of MGF
	1. M_X(0) = 1
	The 0th moment of any random variable, continuous or discrete, is 1

	2. $$M_{aX+b}(t) = e^{bt}\cdot M_X(at)$$
	Scaling and shifting random variable X:
	$$M_{aX+b}(t) = E[e^{t(aX+b)}]$$
	$$= E[e^{(at)X}\cdot e^{tb}]$$
	$$= e^tb\cdot E[e^{atX}]$$
	$$= e^tb\cdot M_X(at)$$

	3. $$M_{X+Y}(t) = M_X(t) \cdot M_Y(t)$$, if X and Y are independent
	The MGF of a sum of random variables is the product of the MGFs of each.
	$$M_{X+Y}(t) = E[e^{t(X+Y)}]$$
	$$= E[e^{tX+tY}]$$
	$$= E[e^{tX}\cdot e^{tY}]$$
	$$= E[e^{tX}]\cdot E[e^{tY}]$$

	4. The nth derivative of the MGF of X at t=0 is the nth moment of X.
	$$\left. \frac{d^n}{dt^n} M_X(t) \right \rvert_{t=0} = E[X^n]$$

	$$M'_X(t) = \frac{d}{dt} M_X(t)$$
	$$= \frac{d}{dt} E[e^{tX}]$$
	$$= E[\frac{d}{dt} e^{tX}]$$
	$$= E[X\cdot e^{tX}]$$
	$$M'_X(0) = E[X]$$

	$$M''_X(t) = \frac{d^2}{dt^2} M_X(t)$$
	$$= E[\frac{d}{dt}(X\cdot e^{tX})]$$
	$$= E[X^2\cdot e^{tX}]$$

	** Percentiles, Mode, Skewness, and Kurtosis

	*** Percentile and Quartile
	The 100p^{th} percentile (\pi_p) for 0<=p<=1. The value where F_X(\pi_p) = p (or greater, if variable is discrete).

	Interquartile range = \pi_0.75 - \pi_0.75

	*** Mode
	The set of values of the random variable which appear most often at which points the PDF or PMF of that variable is highest (is most likely to appear).

	Solve for the derivative of the PDF to equal 0: $$\frac{d}{dx}f_X(x) = 0$$

	To double check for multiple modes, the second derivative at x should be negative to indicate local maxima (the slope is decreasing).

	*** Skewness and Kurtosis
	Skewness: measure of the asymmetrical property of a distribution. Perfect symmetry is 0 skew. Positive skew is skewed to the right (not that the mode is to the right, but that the tail extends longer to the right). Negative skew means left tail is stretched.

	$$\text{Skewness} = \frac{E[(X - \mu)^3]}{\sigma^3}$$

	Skewness is the ratio of the third central moment to the third power of the standard deviation.

	Kurtosis measures the "peakedness" of a distribution. Higher kurtosis means a sharper peak.

	$$\text{Kurtosis} = \frac{E[(X-\mu)^4]}{\sigma^4}$$

	Kurtosis is the ratio of the 4th central moment to the 4th power of the standard deviation.

	** Chebyshev's Inequality
	For any probability distribution, at most $$\frac{1}{k^2}$$ of the values can be more than k standard deviations away from the mean.

	Given random variable X, with finite E(X) = \mu, finite non-zero \sigma^2:
	For any real number k\|k>0, $$\Pr(\|X-\mu\|\geq k\sigma)\leq \frac{1}{k^2}$$

	Simplified:
	$$\Pr(X-\mu \geq k\sigma) + \Pr(X-\mu \leq -k\sigma) \leq \frac{1}{k^2}$$
	$$\Pr(X \geq \mu + k\sigma) + \Pr(X\leq \mu - k\sigma) \leq \frac{1}{k^2}$$

	Of course, when k=1, this inequality is self evident. No probability is greater than 1. When 0<k<1, this inequality doesn't hold because \frac{1}{k^2} becomes greater than one.

	Alternate form:
	$$Pr(\|X-\mu\|\geq k)\leq \frac{\sigma^2}{k^2}$$

	** Univariate Transformations
	Transformation: using the distribution function of one RV to determine the distribution function of another RV.

	*** CDF Method
	1. PDF of X -> integrate
	2. CDF of X
	3. Transformation (algebra + substitution)
	4. CDF of Y -> differentiate
	5. PDF of Y

	*** PDF Method ("shortcut")
	Go directly from PDF to PDF.

	$$f_Y(y)=f_X[g^{-1}(y)]\cdot \|\frac{d}{dy}g^{-1}(y)\|$$

	* Discrete Distributions
	** Discrete Uniform Distribution
	Picking marbles labeled 1 to N out of a jar.

	Mass function: $$Pr[X = x] = \frac{1}{N}$$ for n = 1,2,...N

	Mean: $$\mu=E(X)=\frac{N+1}{2}$$

	Variance: $$\sigma^{2}=Var(X)=\frac{N^2-1}{12}$$

	Moment generating function: $$M_X(t)=\frac{e^t(1-e^{NT})}{N(1-e^t)}$$

	*** Deriving the MGT for Discrete Uniform
	Remember the formula for the sum of a geometric sequence:
	$$\sum_a^b e^{tX} = e^{at}+e^{(a+1)t}+...+e^{bt}$$
	$$=\frac{\text{first term}-\text{first ommitted term}}{1-\text{common ratio}}$$
	$$=\frac{e^{at}-e^{(b+1)t}}{1-e^t}$$

	In the common case: $$M_X(t) = \frac{e^t - e^{(n+1)t}}{1-e^t}$$

	** Bernoulli Distribution
	- Two values: 0 or 1

	Parameter: probability of getting value 1, p

	*** Formulas
	Pr[X=x] = p, x=1; q, x=0

	E(X) = p

	$$E(X^2) = 0^2q + 1^2p$$

	$$Var(X) = p-p^2 = p(1-p) = pq$$

	$$M_X(t)=qe^{t(0)} + pe^{t(1)}= pe^t+q$$

	*** Bernoulli shortcut
	Given Y, a random variable with 2 discrete possible values a and b with probabilities p and q:

	$$Var[Y] = Var[(a-b)X + b] = (a-b)^2 pq$$

	** Binomial Distribution
	Number of successes in N number of Bernoulli trials.

	n = number of trials
	p = probability of success on each trial

	$$X\sim Bin(n,p)$$

	*** Formulas
	$$Pr[X=x] = {n\choose x} p^xq^{n-x}, x=0,1,2,...,n$$
	$$E(X)=np$$
	$$Var(X)=npq$$
	$$M_X(t)=(pe^t+q)$$

	** Hypergeometric Distribution
	Like a binomial distribution, but /with replacement/. (can there also be more than 2 groups to choose from? probably. to be confirmed)

	*** Formulas
	Probability of choosing x of m successes from n out of N choices:

	$$\Pr(X=x)=\frac{{m\choose x}{{N-m}\choose{n-x}}}{{N\choose n}}$$

	** Geometric Distribution
	Number of Bernoulli trials necessary to obtain first success.

	p = success of individual Bernoulli trial

	*** Formulas
	X models number of trials needed to obtain first success.

	$$Pr[X=x]=q^{x-1}\cdot p$$ for x=1,2,3...

	$$\mu = E[X] = \frac{1}{p}$$

	$$\sigma^2=Var[X]=\frac{q}{p^2}$$

	$$M_X(t)=\frac{pe^t}{1-qe^t}$$

	*** Special Properties
	- Memoryless:
	If k failures have already happened, the number of expected trials remaining is still the same.

	*** Alternative Forms
	Y models number of failures prior to the first success (Y=X-1)

	$$Pr[Y=y]=q^y\cdot p$$ for y = 0,1,2...

	$$\mu=E[y]=E[X-1]=E[X]-1 = \frac{1}{p}-1 = \frac{q}{p}$$

	$$\sigma^2 = Var[Y]=Var[X-1]=Var[X]=\frac{q}{p^2}$$

	** Negative Binomial Distribution
	X = the number of Bernoulli trials until a number of successes occur.

	p = probability of success each trial
	r = number of desired successes

	*** Formulas
	Probability of rth success on xth trial:
	$$Pr(X=x)={{x-1}\choose {r-1}}p^r\cdot (1-p)^{x-r}$$ x >= r

	$$E[X]=E[N_1+N_2+N_3+...N_r]$$
	$$=r\cdot E[N] = r\cdot \frac{1}{p}$$ for 0<p<1, r>0

	$$Var[X] = r\cdot Var[N] = r\cdot \frac{1-p}{p^2}$$ for 0<p<1, r>0

	$$M_X(t) = M_{N_1+N_2+N_3...+N_r}(t) = E[e^{t(N_1...N_r)}]$$
	$$= [M_N(t)]^{r} = \left (\frac{pe^t}{1-(1-p)e^t} \right )^r$$

	** Poisson Distribution
	X = the occurrences of an event in a fixed time interval.

	\lambda = E[X]

	*** Formulas
	$$Pr(X=x)=\frac{e^{-\lambda}\cdot \lambda^x}{x!}$$

	$$Var[X] = \lambda$$

	$$M_X(t)=E[e^{tX}]=\sum_0^{\infty} e^{tx}\cdot \frac{e^{-\lambda}\cdot \lambda^x}{x!}$$
	$$=e^\lambda \cdot \sum_0^\infty \frac{(\lambda e^t)^x}{x!}$$
	$$M_X(t)=e^{\lambda(e^t-1)}$$

	*** Special Properties
	1. Sum of independent Poisson random variables is Poisson
	if X and Y are Poisson with \lambda_1 and \lambda_2, (X+Y) is Poisson with mean \lambda_1 + \lambda_2

	* Continuous Distributions

	** Continuous Uniform
	A constant density function between some interval.

	[a,b] = interval of distribution

	*** Formulas
	$$f_X(x) = \frac{1}{b-a}$$ for a<=x<=b

	Length of desired area / length of domain:
	$$F_X(x) = \frac{x-a}{b-a}$$, a<=x<=b

	$$\mu=E[X] = \frac{a+b}{2}$$

	$$\sigma^2 = Var[X] = \frac{(b-a)^2}{12}$$

	$$M_X(t) = \frac{e^{tb}-e^{ta}}{t(b-a)}$$

	Because t is in the denominator, it must be evaluated using L'hospital's rule at 0.

	** Exponential Distribution
	Measures the length of time until an event occurs

	\lambda = a constant hazard rate

	*** Formulas
	$$f_X(x) = \lambda e^{-\lambda x}$$ x>=0

	$$F_X(x) = 1-e^{-\lambda x}$$

	$$S_X(x) = e^{-\lambda x}$$

	$$\mu = E[X] = \frac{1}{\lambda}$$

	$$\sigma^2 = \frac{1}{\lambda^2}$$

	$$M_X(t) = \frac{\lambda}{\lambda - t}$$ t < \lambda

	*** Special Properties
	It is similar to the geometric distribution.

	- Memoryless
	Even if you've waited k, the remaining waiting time X-k is still exponential, with the same mean. Waiting does not mean your remaining wait time is reduced.
	$$E[X-k\|X>k] = E[X] = \frac{1}{\lambda}$$

	** Gamma Distribution
	http://tinyurl.com/zhogckx

	X>=0, \lambda = hazard function per iid exponential distribution, and \alpha = number of iid exponential distributions, both > 0

	$$\Gamma(\alpha) = \int_0^\infty x^{\alpha-1}e^{-x}dx$$
	$$\Gamma(\alpha) = \int_0^\infty x^{\alpha}e^{-x}\frac{dx}{x}$$

	Unless \alpha is a whole number, this integration will be difficult.

	$$\Gamma(\alpha+1) = \alpha\Gamma(\alpha)$$ when \alpha>1

	$$\Gamma(\frac{1}{2}) = \sqrt_{\pi}$$

	*** Formulas
	$$f_X(x) = \frac{\lambda^a}{\Gamma(a)}\cdot x^{\alpha-1}e^{-\lambda x}$$ for x>=0
	$$f_X(x) = \frac{1}{\Gamma(a)\theta^a}\cdot x^{\alpha-1}e^{-\frac{x}{\theta}}$$ for x>=0

	$$\mu = E[X] = \frac{\alpha}{\lambda}$$

	$$\sigma^2 = Var[X] = \frac{\sigma}{\lambda^{2}}$$

	$$M_X(t) = \left (\frac{\lambda}{\lambda-t} \right)^{\alpha}$$ t<\lambda

	The CDF of X is 1 - (sum of the PMF of a Poisson(\lambda = x/\theta)), with the number of terms in sum being \alpha.

	$$Y\sim Poisson(\lambda = \frac{x}{\theta})$$

	$$F_X(x) = \Pr(Y\geq y) = 1-\Pr(Y<y) = 1 - \Pr(Y\leq y-1)$$

	*** Special Properties
	When \alpha=1, the Gamma distribution = the exponential distribution.

	For other whole numbers, the Gamma distribution is the sum of \alpha iid exponential variables.

	$$\Gamma(\alpha) = (\alpha-1)!$$

	*** Alternate forms
	$$\lambda = \frac{1}{\theta}$$

	** Normal Distribution
	\mu and \sigma^2

	*** Formulas
	$$f_X(x) = \frac{1}{\sigma \sqrt_{2\pi}} \ e^{\frac{(x-\mu)^2}{2\sigma^2}}$$

	\mu and \sigma^2 are given

	$$M_X(t) = e^{(\mu t + \frac{\sigma^2 t^2}{2})}$$

	*** Special Properties
	If X is normal, $$Z = \frac{X-\mu}{\sigma}$$ is also normal, with \mu = 0 and \sigma^2 = 1.

	X is symmetric around \mu, and Z is symmetric around 0.

	The sum of independent normal random variables is also normal.

	* Central Limit Theorem
	Given X_1, X_2, X_3 i.i.d. random variables, all with same \mu and \sigma.

	Define S_n as sum of the first n random variables, and Z_n as its Z-score.
	$$S_n = \sum_1^n X_i$$, and $$Z_n = \frac{S_n/n - \mu}{\sigma/\sqrt_n}$$
	$$= \frac{S_n - n\mu}{\sigma \sqrt_n}$$

	u_Z = 0, \sigma = 1