Skip to content

Instantly share code, notes, and snippets.

@nightscape
Last active December 4, 2015 12:47
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save nightscape/c2fcccac859b3ae34c99 to your computer and use it in GitHub Desktop.
Graphical exploration of Bayesian Networks in Spark Notebook

This Gist contains a Spark Notebook that displays graphs for Bayesian Networks using bayes-scala and a custom D3 visualization.

To check it out, install Spark Notebook by one of the methods described here (when using the Zip download, you have to chmod +x bin/spark-notebook && bin/spark-notebook). Then download Plot graph with D3.snb from this Gist and import it in the browser window. You can then step through the code cells by clicking into the first one and pressing Shift+Enter or using the Run button. If all goes well, you should see two Bayesian Network graphs for the Monty Hall and the Student example.

{
"metadata" : {
"name" : "Plot graph with D3",
"user_save_timestamp" : "2014-12-15T00:55:09.510Z",
"auto_save_timestamp" : "2014-12-15T00:50:41.883Z",
"language_info" : {
"name" : "scala",
"file_extension" : "scala",
"codemirror_mode" : "text/x-scala"
},
"trusted" : true,
"customLocalRepo" : null,
"customRepos" : null,
"customDeps" : [ "com.github.danielkorzekwa % bayes-scala_2.11 % 0.5" ],
"customImports" : null,
"customArgs" : null,
"customSparkConf" : null
},
"cells" : [ {
"metadata" : {
"trusted" : true,
"input_collapsed" : false,
"collapsed" : false
},
"cell_type" : "code",
"source" : "import notebook.front.third.d3._\nimport notebook._, front._, widgets._\nimport notebook.JsonCodec._\nimport play.api.libs.json._\nimport dk.bayes.dsl.variable.Categorical\nimport dk.bayes.dsl.infer\ntype CategoricalWithInfo = (String, Categorical, Seq[String])\nval loadedCode = {\n val source = scala.io.Source.fromURL(\"https://raw.githubusercontent.com/crealytics/d3-bayesian-network/master/d3-bayesian-network.js\")\n val res = source.mkString\n source.close()\n res\n}\n\nimport play.api.libs.json._\nimport play.api.libs.functional.syntax._\nimport dk.bayes.dsl.variable.Categorical\nimport dk.bayes.dsl.infer\n\ncase class ConditionalProbabilityTable(node: BnNode, parents: Seq[BnNode], probabilities: Seq[Double])\ncase class BnNode(name: String, states: Seq[String], currentState: Option[String] = None)\ncase class BnEdge(source: Int, target: Int)\n\ndef categoricalsToNetwork(marginalizer: Categorical => Seq[Double], cptExtractor: Categorical => Seq[Double] = _.cpd)(categoricalsWithNames: Seq[CategoricalWithInfo]): (Seq[(BnNode, ConditionalProbabilityTable, ConditionalProbabilityTable)], Seq[BnEdge]) = {\n import breeze.linalg._\n import breeze.numerics._\n val nodes = categoricalsWithNames.map { case(name, categorical, states) =>\n val currentState = categorical.getValue().map(states.apply)\n BnNode(name, states, currentState)\n }\n val categoricals = categoricalsWithNames.map(_._2)\n val nodeMap = categoricals.zip(nodes).toMap\n val cpts = categoricals.zip(nodes).map { case(cat, node) =>\n val parents = cat.parents.map(nodeMap)\n val numCols = node.states.size\n val cpd = cptExtractor(cat)\n val inferredCpd = infer(cat).cpd\n val numRows = cpd.size / numCols\n val cptArray = cpd.toArray\n val cpt = new DenseMatrix(numCols, numRows, cptArray).t\n (ConditionalProbabilityTable(node, parents, cpt.toArray), ConditionalProbabilityTable(node, Seq(), inferredCpd.toArray))\n }\n val edges = nodeMap.flatMap { case(cat, node) =>\n val parents = cat.parents.map(nodeMap)\n parents.map(p => BnEdge(nodes.indexOf(p), nodes.indexOf(node)))\n }.toSeq\n (nodes.zip(cpts).map { case(n, (c, m)) => (n, c, m)}, edges)\n}\n\nobject ConditionalProbabilityTable {\n implicit val conditionalProbabilityTableWrites: Writes[ConditionalProbabilityTable] = Json.writes[ConditionalProbabilityTable]\n}\nobject BnNode {\n implicit val nodeWrites: Writes[BnNode] = Json.writes[BnNode]\n}\nobject BnEdge {\n implicit val edgeWrites: Writes[BnEdge] = Json.writes[BnEdge]\n}\n\ndef networkToJson(nodesWithCpts: Seq[(BnNode, ConditionalProbabilityTable, ConditionalProbabilityTable)], edges: Seq[BnEdge]): JsObject = {\n val nodeJs = nodesWithCpts.map { case(node, cpt, marginalized) =>\n Json.toJson(node).asInstanceOf[JsObject] + (\"cpt\", Json.toJson(cpt)) + (\"marginalized\", Json.toJson(marginalized))\n }\n Json.obj(\"nodes\" -> Json.toJson(nodeJs), \"edges\" -> Json.toJson(edges))\n}\n\nval convertCategoricals: Seq[CategoricalWithInfo] => JsObject = (categoricalsToNetwork({cat: Categorical => infer(cat).cpd}) _).andThen((networkToJson _).tupled)\n\nimplicit val categoricalsCodec = new Codec[JsValue, Seq[CategoricalWithInfo]] {\n def encode(x:JsValue):Seq[CategoricalWithInfo] = Seq()\n def decode(x:Seq[CategoricalWithInfo]):JsValue = convertCategoricals(x)\n}\n\n/**\n * @param style Sets the style of the graph to either\n * \"cpt\" (conditional probability table) or\n * \"barchart\" (only inferred probabilities, default)\n */\ndef playgroundCode(style: String = \"barchart\") = s\"\"\"\nfunction(dataPipe, e) {\n $loadedCode\n var bnGraph = d3.bayesianNetwork(e, \"$style\")\n bnGraph(this.dataInit[0])\n dataPipe.subscribe(function(d) {\n bnGraph(d[0])\n })\n}\n\"\"\"\n()",
"outputs" : [ ]
}, {
"metadata" : {
"trusted" : true,
"input_collapsed" : false,
"collapsed" : false
},
"cell_type" : "code",
"source" : "val carDoor = Categorical(Vector(1d / 3, 1d / 3, 1d / 3))\nval guestDoor = Categorical(Vector(1d / 3, 1d / 3, 1d / 3))\n\nval montyDoor = Categorical(carDoor, guestDoor, Vector(\n 0, 0.5, 0.5,\n 0, 0, 1,\n 0, 1, 0,\n 0, 0, 1,\n 0.5, 0, 0.5,\n 1, 0, 0,\n 0, 1, 0,\n 1, 0, 0,\n 0.5, 0.5, 0))\nval montyHallExample = Seq(\n (\"Car\", carDoor, Seq(\"Door 1\", \"Door 2\", \"Door 3\")),\n (\"Guest\", guestDoor, Seq(\"Door 1\", \"Door 2\", \"Door 3\")),\n (\"Monty\", montyDoor, Seq(\"Door 1\", \"Door 2\", \"Door 3\"))\n)\nval montyHallGraph = new Playground(Seq(montyHallExample), List(Script(\"consoleDir\", JsObject(Nil))), List(playgroundCode()))",
"outputs" : [ ]
}, {
"metadata" : {
"trusted" : true,
"input_collapsed" : false,
"collapsed" : false
},
"cell_type" : "code",
"source" : "guestDoor.setValue(0) //Guest chooses door 1\nmontyDoor.setValue(1) //Monty opens door 2\nmontyHallGraph(Seq(montyHallExample))",
"outputs" : [ ]
}, {
"metadata" : {
"trusted" : true,
"input_collapsed" : false,
"collapsed" : false
},
"cell_type" : "code",
"source" : "val difficulty = Categorical(Vector(0.6, 0.4))\nval intelli = Categorical(Vector(0.7, 0.3))\nval grade = Categorical(intelli, difficulty, Vector(0.3, 0.4, 0.3, 0.05, 0.25, 0.7, 0.9, 0.08, 0.02, 0.5, 0.3, 0.2))\nval sat = Categorical(intelli, Vector(0.95, 0.05, 0.2, 0.8))\nval letter = Categorical(grade, Vector(0.1, 0.9, 0.4, 0.6, 0.99, 0.01))\nval studentExample = Seq(\n (\"Difficulty\", difficulty, Seq(\"Easy\", \"Hard\")),\n (\"Intelligence\", intelli, Seq(\"Derp\", \"Smart\")),\n (\"Grade\", grade, Seq(\"Excellent\", \"Average\", \"Bad\")),\n (\"SAT\", sat, Seq(\"Low\", \"High\")),\n (\"Letter\", letter, Seq(\"Weak\", \"Strong\"))\n)\nval studentGraph = new Playground(Seq(studentExample), List(Script(\"consoleDir\", JsObject(Nil))), List(playgroundCode(\"cpt\")))",
"outputs" : [ ]
}, {
"metadata" : {
"trusted" : true,
"input_collapsed" : false,
"collapsed" : false
},
"cell_type" : "code",
"source" : "intelli.setValue(1)\ndifficulty.setValue(0)\nstudentGraph(Seq(studentExample))\n",
"outputs" : [ ]
}, {
"metadata" : {
"trusted" : true,
"input_collapsed" : false,
"collapsed" : false
},
"cell_type" : "code",
"source" : "intelli.setValue(1)\nstudentGraph(Seq(studentExample))",
"outputs" : [ ]
}, {
"metadata" : {
"trusted" : true,
"input_collapsed" : false,
"collapsed" : true
},
"cell_type" : "code",
"source" : "",
"outputs" : [ ]
} ],
"nbformat" : 4
}
@francisdb
Copy link

Will this be integrated in some way?

@nightscape
Copy link
Author

Hi Francis, sorry I didn't see your comment. I think Github didn't even notify me about it?
Were you referring to integrating this into spark-notebook or into bayes-scala?
I'm not sure where to hang this thing ;)

@javadba
Copy link

javadba commented Dec 4, 2015

I am new to Spark-notebook. Upon uploading this .snb and attempting to run it the following errors occur

:12: error: not found: value dk
import dk.bayes.dsl.infer
^
:11: error: not found: value dk
import dk.bayes.dsl.variable.Categorical
^
:13: error: not found: type Categorical
type CategoricalWithInfo = (String, Categorical, Seq[String])
^

etc. How is Spark notebook supposed to find the dk.* packages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment