mkurian/MLLearningMaterials.txt

## MLLearningMaterials.txt

https://github.com/lazyprogrammer/data-science-blogs

https://github.com/lazyprogrammer/machine_learning_examples : for book Markov Models

https://github.com/lazyprogrammer/DeepLearningTutorials

https://github.com/lazyprogrammer/image-classification-dbn

https://github.com/lazyprogrammer/facial-expression-recognition


*Markov Models:*

1. Weather
Markov Property/Assumption: tomorrow's value is only based on today's value, not yesterday's -- only on most recent value
Current state depends only on previous state or next state depends only on current state.
Distribution of state at time t depends only on state at time t-1.
P(s(t)| s(t-1))

Goal: Model joint probability or probability of seeing an entire specific sequence
-- Chain rule of probability. Direct consequence of Bayes rule

P(s(1),s(0)) = P(s(1)| s(0))P(s(0))

P(sunny(t)| rain(t-i)) = count(sunny(t)|rain(t-1))/count(rain(t-1))

Estimates more accurate when no. of samples approaches infinity.

For sentence completion, you may need 2-3 previous words.
So train on sequences of 3-4 words, current word depends on 2-3 previous words

If current state depends on 2 previous states -- 2nd order Markov Model, 3 previous states -- 3rd order Markov Model

Weather prediction: model
3 states --1 weight each: 3*3 weights
For M states, 1 weight each, M*M weights
A: State transition matrix or Transition probabilities
A(i,j) -- prob of going from state i to state j

A(i,j) = P(S(j)|S(i))

P(S(t)=i, S(t-1)=j)  = P(S(j)|S(i))

This matrix also could change over time and not a constant, eg. one rainy day following another rainy day more in winter than summer

To simplify, consider only constant transition probability matrix

Where we start: initial state distribution: M-dim vector pi0
max likelihood


eg. 3 sentences -- training data
i like dogs
i like cats
i love kangaroos

states: [i, like, dogs, cats, love, kangaroos]
indices[1,2,3,4,5]


so pi0 = [1,0,0,0,0]
pi("i")=1

P(like|i) = 2/3
P(love|i) = 1/3

P(dogs| like) = P(cats|like) = 1/2
P(kangaroos | love) =1

All other state transition probabilities are 0

Engligh has large vocabulary, 1 m words
Instead of max likelihood, use smoothed estimates

add a small number to numerator (epsilon)
add V*epsilon to denom, V=vocabulary size

P(s(t) = j | s(t-1) =i) = [count(i -->j) + epsilon] / [count(i) + epsilon * V]

There are 2 objects we need to represent a Markov model:
1. A, transition probability matrix
2. pi0, initial state distribution

2. Markov Chains
  discrete time stochastic process

eg. what is probability of sunny day 5 days from now?

pi1 = pi0 x A -- state dist at time t=1
p12 = pi1 x A -- state dist at time t=2
p13 = pi2 x A -- state dist at time t=3
p14 = pi3 x A -- state dist at time t=4
p15 = pi4 x A -- state dist at time t=5

State distribution of p(s(t)) at time t = pi0 x A ^ t

p(s) = p(s)x A --> stationary distribution. No matter how many times we transition, still have same state distribution

How can we find a stationary distribution ? This is the eigen value problem, where eigen value is 1.

Eigen values are not unique. So after finding eigen vector for which eigen value is 1, you have to normalize it so that it sums to 1.

What state do we expect to end up in? Final state distribution
or state distribution at time inifinity or pi(inf)

pi(inf) = pi0 (A ^ infinity)
or
pi(inf) = pi(inf) A

--> equilibrium/limiting distribution
state distribution that you settle into after a very long time

Can be used for weather, not stock market
Markov Chain Monte Carlo <MCMC>

3.Healthy or Sick:
If you want to model a certain situation or environment, take some data you;ve gathered, build a simple maximum likelihood model on it, do things like: study properties that emerge from the model or make predictions from the model or generate next most likely state from the model

4.Expected no. of continuously sick days


https://github.com/mkurian/machine_learning_examples

	https://github.com/lazyprogrammer/data-science-blogs

	https://github.com/lazyprogrammer/machine_learning_examples : for book Markov Models

	https://github.com/lazyprogrammer/DeepLearningTutorials

	https://github.com/lazyprogrammer/image-classification-dbn

	https://github.com/lazyprogrammer/facial-expression-recognition



	Markov Models:

	1. Weather
	Markov Property/Assumption: tomorrow's value is only based on today's value, not yesterday's -- only on most recent value
	Current state depends only on previous state or next state depends only on current state.
	Distribution of state at time t depends only on state at time t-1.
	P(s(t)\| s(t-1))

	Goal: Model joint probability or probability of seeing an entire specific sequence
	-- Chain rule of probability. Direct consequence of Bayes rule

	P(s(1),s(0)) = P(s(1)\| s(0))P(s(0))

	P(sunny(t)\| rain(t-i)) = count(sunny(t)\|rain(t-1))/count(rain(t-1))

	Estimates more accurate when no. of samples approaches infinity.

	For sentence completion, you may need 2-3 previous words.
	So train on sequences of 3-4 words, current word depends on 2-3 previous words

	If current state depends on 2 previous states -- 2nd order Markov Model, 3 previous states -- 3rd order Markov Model

	Weather prediction: model
	3 states --1 weight each: 3*3 weights
	For M states, 1 weight each, M*M weights
	A: State transition matrix or Transition probabilities
	A(i,j) -- prob of going from state i to state j

	A(i,j) = P(S(j)\|S(i))

	P(S(t)=i, S(t-1)=j) = P(S(j)\|S(i))

	This matrix also could change over time and not a constant, eg. one rainy day following another rainy day more in winter than summer

	To simplify, consider only constant transition probability matrix

	Where we start: initial state distribution: M-dim vector pi0
	max likelihood


	eg. 3 sentences -- training data
	i like dogs
	i like cats
	i love kangaroos

	states: [i, like, dogs, cats, love, kangaroos]
	indices[1,2,3,4,5]


	so pi0 = [1,0,0,0,0]
	pi("i")=1

	P(like\|i) = 2/3
	P(love\|i) = 1/3

	P(dogs\| like) = P(cats\|like) = 1/2
	P(kangaroos \| love) =1

	All other state transition probabilities are 0

	Engligh has large vocabulary, 1 m words
	Instead of max likelihood, use smoothed estimates

	add a small number to numerator (epsilon)
	add V*epsilon to denom, V=vocabulary size

	P(s(t) = j \| s(t-1) =i) = [count(i -->j) + epsilon] / [count(i) + epsilon * V]

	There are 2 objects we need to represent a Markov model:
	1. A, transition probability matrix
	2. pi0, initial state distribution

	2. Markov Chains
	discrete time stochastic process

	eg. what is probability of sunny day 5 days from now?

	pi1 = pi0 x A -- state dist at time t=1
	p12 = pi1 x A -- state dist at time t=2
	p13 = pi2 x A -- state dist at time t=3
	p14 = pi3 x A -- state dist at time t=4
	p15 = pi4 x A -- state dist at time t=5

	State distribution of p(s(t)) at time t = pi0 x A ^ t

	p(s) = p(s)x A --> stationary distribution. No matter how many times we transition, still have same state distribution

	How can we find a stationary distribution ? This is the eigen value problem, where eigen value is 1.

	Eigen values are not unique. So after finding eigen vector for which eigen value is 1, you have to normalize it so that it sums to 1.

	What state do we expect to end up in? Final state distribution
	or state distribution at time inifinity or pi(inf)

	pi(inf) = pi0 (A ^ infinity)
	or
	pi(inf) = pi(inf) A

	--> equilibrium/limiting distribution
	state distribution that you settle into after a very long time

	Can be used for weather, not stock market
	Markov Chain Monte Carlo <MCMC>

	3.Healthy or Sick:
	If you want to model a certain situation or environment, take some data you;ve gathered, build a simple maximum likelihood model on it, do things like: study properties that emerge from the model or make predictions from the model or generate next most likely state from the model

	4.Expected no. of continuously sick days







	https://github.com/mkurian/machine_learning_examples