nicolewhite/markov.adoc

## markov.adoc

      
    Raw
  

              markov.adoc
            
          
    Markov Chains


A while back I wrote a blog post explaining Markov chains and demonstrating different ways of finding their steady-state distribution in R. Now, I want to play with Markov chains as a graph. I’m going to pull examples from around the internet and answer the same questions in Cypher as the authors do with matrices. This gives me the opportunity to explore more advanced Cypher queries while working with a topic I enjoy very much (stochastic processes and Markov chains). So this is officially just for funsies.


I found three Markov chains online that I’m going to showcase, and they involve the following topics:


Finance


Manufacturing


Actuarial Science


If you are unfamiliar with Markov chains, you might want to refer to my blog post linked above for a quick overview.


Finance


Description


The Wikipedia article on Markov chains has an example consisting of states and probabilities for a hypothetical stock market. The market can be in one of three states:


Bull


Bear


Stagnant


Matrix Representation


Graph Representation


Wikipedia conveniently provides a graphical representation of the states and probabilities:


SOURCE


//States
CREATE (Bull {name:'Bull'}), (Bear {name:'Bear'}), (Stagnant {name:'Stagnant'})

//Probabilities
CREATE (Bull)-[:PROBABILITY {p:0.9}]->(Bull),
     (Bull)-[:PROBABILITY {p:0.075}]->(Bear),
	   (Bull)-[:PROBABILITY {p:0.025}]->(Stagnant),
	   (Bear)-[:PROBABILITY {p:0.15}]->(Bull),
	   (Bear)-[:PROBABILITY {p:0.8}]->(Bear),
	   (Bear)-[:PROBABILITY {p:0.05}]->(Stagnant),
	   (Stagnant)-[:PROBABILITY {p:0.25}]->(Bull),
	   (Stagnant)-[:PROBABILITY {p:0.25}]->(Bear),
	   (Stagnant)-[:PROBABILITY {p:0.5}]->(Stagnant)


Question


The question posed in the article is: If we are currently in a Bear market, what is the probability distribution three time-steps from now? In other words, what is the probability we will be in either a Bull, Bear, or Stagnant market three days from now given we are in a Bear market today?


Solve with Matrices


Wikipedia solves this by taking the one-time-step transition probability matrix, \(P\), to the third power and extracting the relevant row:

\(P^3 = \begin{bmatrix} 0.7745 & 0.17875 & 0.04675 \\ 0.3575 & 0.56825 & 0.07425 \\ 0.4675 & 0.37125 & 0.16125 \end{bmatrix} \)

\(Answer = [0.3575, 0.56825, 0.07425]\)


Solve with Cypher


In Cypher, we can achieve the same answer with the following:


MATCH (w)-[r]->(x)
WHERE w.name= "Bear"
WITH x, r.p as p1
MATCH (x)-[r]->(y)
WITH y, p1, r.p as p2
MATCH (y)-[r]->(z)
WITH z, p1 * p2 * r.p AS product
RETURN z.name AS `State at time t = 3`, SUM(product) AS Probability


This tells us that three days from now, given we are currently in a Bear market, we will be in a Bull market with probability 0.3575, in a Bear market with probability 0.56825, or in a Stagnant market with probability 0.07425.


MATCH n-[r]-m
WITH n, r
DELETE n, r


Manufacturing


Another example I found during my Google search expedition is the following from slide 4 of this Beamer presentation. The language in the presentation is not very clear; I think the author switched the example from a car to a machine but did not change all instances of the word 'car' in the presentation. Anyway, here is a description:


Description


Machines at a factory are either working or broken. If a machine is currently working, it will break down by tomorrow with probability \(b\). If a machine is currently broken, it will be repaired by the maintenance crew by tomorrow with probability \(r\). This factory has a rule that if a machine is broken for at least three days, then the machine is put on a list of urgent repairs and will be repaired by tomorrow with probability \(1\).


Because the machine can be broken for up to but not exceeding three days, this Markov chain consists of four states:


Working


Broken for 1 day


Broken for 2 days


Broken for 3 days


Matrix Representation


Graph Representation


//States
CREATE (Working {name:'Working'}),
		(Broken1 {name:'Broken1'}),
		(Broken2 {name:'Broken2'}),
		(Broken3 {name:'Broken3'})

//Probabilities
CREATE (Working)-[:PROBABILITY {p:0.9}]->(Working),
		(Working)-[:PROBABILITY {p:0.1}]->(Broken1),

		(Broken1)-[:PROBABILITY {p:0.8}]->(Working),
		(Broken1)-[:PROBABILITY {p:0.2}]->(Broken2),

		(Broken2)-[:PROBABILITY {p:0.8}]->(Working),
		(Broken2)-[:PROBABILITY {p:0.2}]->(Broken3),

		(Broken3)-[:PROBABILITY {p:1}]->(Working)


Question


From the presentation (slide 7):


The factory buys 100 machines. 25 of the machines are bought used; of these 25, 20 have been broken for one day and 5 have been broken for two days. The remaining machines are new and are in working order. Four days from now, how many machines do we expect to have on our urgent repair list? In other words, how many machines will be in their third day of disrepair on day four?


The presentation considers the following values for \(r\) and \(b\):


\(r = 0.8\)


\(b = 0.1\)


Solve with Matrices


To solve using matrices / vectors, we first need the initial distribution vector. We know 20 machines have been broken for one day, 5 machines have been broken for two days, 0 have been broken for three days, and thus 100 - 25 = 75 machines are in working order:

\(\mu_0 = [\frac{75}{100}, \frac{20}{100}, \frac{5}{100}, 0] = [0.75, 0.20, 0.05, 0]\)

Next, we need to find \(P^4\) so we can have the four-time-step transition probability matrix:

\(P^4 = \begin{bmatrix} 0.8897 & 0.0889 & 0.0178 & 0.0036 \\ 0.8896 & 0.0896 & 0.0176 & 0.0032 \\ 0.8892 & 0.0892 & 0.0184 & 0.0032 \\ 0.8890 & 0.0890 & 0.0180 & 0.0040 \end{bmatrix} \)

The answer can now be found by multiplying the initial distribution vector \(\mu_0\) by the relevant column (the column corresponding to Broken3) of the four-time-step transition probability matrix \(P^4\):

\(= [0.75, 0.20, 0.05, 0] \times \begin{bmatrix} 0.0036 \\ 0.0032 \\ 0.0032 \\ 0.0040 \end{bmatrix}\)


\(= 0.0036(0.75) + 0.0032(0.20) + 0.0032(0.05) + 0.0040(0)\)


\(= 0.0035\)


Solve with Cypher


To solve this in Cypher, we get to use a CASE statement! How fun:


MATCH (v)-[r]->(w)
WITH v, w, r.p AS p1
MATCH (w)-[r]->(x)
WITH v, x, p1, r.p AS p2
MATCH (x)-[r]->(y)
WITH v, y, p1, p2, r.p AS p3
MATCH (y)-[r]->(z)
WITH v, z, p1 * p2 * p3 * r.p AS product
WHERE z.name = "Broken3"
WITH v,
	product * (
	CASE
		WHEN v.name="Working" THEN 0.75
		WHEN v.name="Broken1" THEN 0.20
		WHEN v.name="Broken2" THEN 0.05
		ELSE 0
	END) AS dist
RETURN SUM(dist) AS Probability


This tells us that we expect 0.0035 or 0.35% of machines to be in their third day of disrepair on day four. 0.35% of 100 is 0.35, so we expect 0.35 machines to be in their third day of disrepair on day four.


Moreover, we can find the distribution of all 100 machines (the presentation does not ask or go over this) with some tweaks in the query:


MATCH (v)-[r]->(w)
WITH v, w, r.p AS p1
MATCH (w)-[r]->(x)
WITH v, x, p1, r.p AS p2
MATCH (x)-[r]->(y)
WITH v, y, p1, p2, r.p AS p3
MATCH (y)-[r]->(z)
WITH v, z, p1 * p2 * p3 * r.p AS product
WITH v, z,
	product * (
	CASE
		WHEN v.name="Working" THEN 0.75
		WHEN v.name="Broken1" THEN 0.20
		WHEN v.name="Broken2" THEN 0.05
		ELSE 0
	END) AS dist
WITH z, SUM(dist) AS prob
ORDER BY prob DESC
RETURN z.name AS State, prob*100 AS `Expected Number of Machines in State on Day 4`


From this, we find that we expect there to be, on day four, roughly 88.97 machines in working order, 8.91 machines in their first day of disrepair, 1.78 machines in their second day of disrepair, and 0.35 machines (which we just found earlier) in their third day of disrepair.


MATCH n-[r]-m
WITH n, r
DELETE n, r


Actuarial Science


In actuarial science, insurance holders are often modeled in Markov chains where the state is typically some indication of their profitability and the probabilities of changing states is typically a function of the insurance holder’s attributes, such as their weight and whether or not they smoke (if we’re dealing with health insurance), or maybe attributes of their car, such as its maximum speed and color (if we’re dearling with automobile insurance).


Using #20 from this exam as inspiration, we have the following:


Description


Insurance holders can belong to one of three classes for automobile insurance:


Preferred


Standard


Substandard


At the end of each year, insurance holders can transition from one class to another according to the following one-time-step transition probability matrix:


Matrix Represenatation


Graph Representation


CREATE  (Preferred {name:'Preferred'}),
	(Standard {name:'Standard'}),
	(Substandard {name:'Substandard'})

CREATE  (Preferred)-[:PROBABILITY {p:0.5}]->(Preferred),
	(Preferred)-[:PROBABILITY {p:0.5}]->(Standard),

	(Standard)-[:PROBABILITY {p:0.2}]->(Preferred),
	(Standard)-[:PROBABILITY {p:0.6}]->(Standard),
	(Standard)-[:PROBABILITY {p:0.2}]->(Substandard),

	(Substandard)-[:PROBABILITY {p:0.3}]->(Standard),
	(Substandard)-[:PROBABILITY {p:0.7}]->(Substandard)


Question 1


If an insurance holder is currently in the Standard class, what is the probablity he will be in the Standard class in two years?


Solve with Matrices


As we know, we need to find \(P^2\) in order to answer this question. The answer will be the (Standard, Standard) entry of this matrix:

\(P^2 = \begin{bmatrix} 0.35 & 0.55 & 0.1 \\ 0.22 & 0.52 & 0.26 \\ 0.06 & 0.39 & 0.55 \end{bmatrix} \)


\(Answer = 0.52\)

There is a 52% chance the customer currently in the Standard class will be in the Standard class in two years.


Solve with Cypher


MATCH (v)-[r]->(w)
WHERE v.name = "Standard"
WITH w, r.p AS p1
MATCH (w)-[r]->(x)
WHERE x.name = "Standard"
RETURN SUM(p1 * r.p) AS Probability


Question 2


A question not asked by the exam but that is interesting: What is the probability an insurance holder currently in the Standard class will not be in the Substandard class in any of the next five years?


Solve with Matrices


In order to solve using matrices, we need to arbitrarily change the one-time-step transition probability matrix so that the Substandard class is an 'absorbing state'. With this, if the Markov chain ever enters the Substandard state, it can’t leave that state. We do this because we eventually find the 'complement' of the probabilities resulting from this arbitrary transition probability matrix in order to answer questions involving scenarios where the process can never enter that state. So, our new one-time-step transition probability matrix is:

\(A = \begin{bmatrix} 0.5 & 0.5 & 0 \\ 0.2 & 0.6 & 0.2 \\ 0 & 0 & 1 \end{bmatrix} \)

Notice the third row (the row corresponding to the Substandard state): \([0, 0, 1]\). This means that there is a 0 probability of transitioning from the Substandard class to the Preferred class, a 0 probability of transitioning from the Substandard class to the Standard class, and a 100% probability of staying in the Substandard class. With this, we then find the five-time-step transitiion probability matrix of our new matrix \(A\):

\(A^5 = \begin{bmatrix} 0.21085 & 0.38905 & 0.4001 \\ 0.15562 & 0.28866 & 0.55572 \\ 0 & 0 & 1 \end{bmatrix} \)

Like I mentioned earlier, we are interested in the 'complement' of the probabilities shown here when answering questions involving the constraint that the insurance holder 'never' enters the Substandard class. Thus, our answer is the complement of the (Standard, Substandard) entry of the matrix shown above, since we want to know the probability that an insurance holder currently in the Standard class is never in the Substandard class during the next five years:


\(Answer = 1 - 0.55572 = 0.44428\)


Solve with Cypher


In Cypher, we do not need to make any arbitrary changes to the transition probabilities. We just tell the query to avoid the Substandard class:


MATCH (u)-[r]->(v)
WHERE u.name = "Standard" AND v.name <> "Substandard"
WITH v, r.p AS p1
MATCH (v)-[r]->(w)
WHERE w.name <> "Substandard"
WITH w, p1, r.p AS p2
MATCH (w)-[r]->(x)
WHERE x.name <> "Substandard"
WITH x, p1, p2, r.p AS p3
MATCH (x)-[r]->(y)
WHERE y.name <> "Substandard"
WITH y, p1, p2, p3, r.p AS p4
MATCH (y)-[r]->(z)
WHERE z.name <> "Substandard"
WITH z, p1 * p2 * p3 * p4 * r.p AS prob
RETURN SUM(prob) AS Probability