This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Introduction | |
1. response is qualitative variable | |
2. estimate the probabilities that X belongs to each category C | |
3. logistic regression->binary | |
4. multiclass logistic regression/discriminant analysis -> multi-class | |
2. Logistic regression | |
1. p(X) = e^(linear model)/(1+e^(linear model)) | |
2. Transformation of linear model to range[0,1] | |
3. log(p(X)/(1-P(X)) = beta0+beta1X, log odds/logit transformation of p(X) | |
4. Maximum Likelihood (Fisher) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Simple linear regression | |
1. Assumes the dependence of response Y on predictors X1,…Xp is linear | |
2. Simple is good | |
3. residual: e = yi-yi_hat | |
4. residual sum of squares = e1^2+…en^2 | |
5. optimisation problem to minimise total RSS, has closed form solution | |
6. A measure of precision -> how close the estimator is close to 0 (no relationship) | |
1. Standard error of the slope | |
2. var(e)/spread around the mean of X | |
3. SE of the intercept |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Regression model | |
1. Target/response to predict Y | |
2. features/input/predictor X = vector(X1, X2, X3) | |
3. Y = f(X) + e, e captures measurement errors and other discrepancies | |
4. Good for | |
1. make prediction | |
2. understand which components are important | |
3. maybe able to understand how each component affects Y (depending on the complexity of f) | |
5. at a particular point, f(4) = E(Y|X=4) E is expected value | |
6. regression function f(x) == E(Y|X=x) conditional expectation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Look at the data first before jumping to analysis | |
2. Supervised learning problem | |
1. tasks | |
1. Accurately predict unseen test cases | |
2. Understand which inputs affect the outcome and how | |
3. Assess the quality of our predictions and inferences | |
2. know when and how to use them | |
3. evaluate the model | |
3. Unsupervised learning | |
1. Data is unlabeled |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Nonparametric methods | |
1. Parametric methods | |
1. fit to fixed number of parameters | |
2. Nonparametric | |
1. number of parameters depends on dataset size | |
2. k-nearest neighbours | |
3. Gaussian/uniform kernels | |
3. Comparison | |
1. Parametric | |
1. limited complexity |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Scaling Variational Inference & unbiased estimates | |
a. Scale to big datasets | |
i. Traditionally, Too slow for big data | |
ii. Not very beneficial | |
b. Mixture model (Bayesian + deep learning) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Monte Carlo Estimation | |
1. Approximation Simulation | |
2. Easy to program, easy to parallel, can be slow for some problems | |
3. quick and dirty | |
4. unbiased | |
5. like an infinite large emsemble of neural networks | |
6. full bayesian modelling | |
7. approximate intractable | |
8. M-step of EM algorithm | |
2. Sampling from 1-d distributions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Topic Modelling | |
1. Decompose books to distributional topics | |
2. Assign topics to texts | |
3. Compute similarity/distance between vectors of texts | |
1. Euclidean distance | |
2. Cosine similarity | |
2. Dirichlet distribution | |
1. support: unitary simplex | |
2. A distribution over triangle | |
3. Latent Dirichlet Allocation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. Goal: compute approximate posterior probability | |
2. Steps: | |
1. select a family of distributions Q as variational family, a product of qi(zi) | |
2. find best approximation q(z) of p*(z), minimize KL divergence | |
3. Mean-field approximation | |
1. Coordinate descend to minimize KL divergence | |
2. Ising model | |
4. Variational EM | |
1. Use variational inference at the E step, instead of minimizing full posterior, minimizing meaningful approximation of posterior as a family of distributions Q | |
2. Called variational EM |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. General Form of EM | |
1. Concave functions | |
2. Satisfy Jensen’s equality f(Et)>=Ef(t) | |
3. kullback-leibler divergence: measure differnce of two probabilistic distributions | |
1. KL divergence | |
2. how different is each data point at any point of x-axis in the log scale, take expectation | |
3. non symmetric | |
4. = 0 if compare to self | |
5. always non-negative | |
4. EM |
NewerOlder