sunxd3/GSoC_2022.ipynb Secret

## GSoC_2022.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ae802899-e560-4267-a17b-28dcbff2f368",
   "metadata": {},
   "source": [
    "# Compiler for BUGS"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3db7c86-5b08-4677-9c48-17b6e2b21144",
   "metadata": {},
   "source": [
    "The report will introduce some aspects of the implementation of [SymbolicPPL](https://github.com/TuringLang/SymbolicPPL.jl)'s compiler program. All the work described in this report is done during GSoC 2022."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de4ec027-0939-4146-a605-22e899f1662e",
   "metadata": {
    "tags": []
   },
   "source": [
    "## The BUGS Language\n",
    "BUGS(Bayesian inference Using Gibbs Sampling) is one of the earliest probabilistic programming language. A legal program in the BUGS language describes a directed probabilistic graphical model. The static nature of these models enables developers to extract much information deterministically, resulting in efficient inference algorithm implementations. Syntactically, to guarantee the static property of the program, all the loop bounds and array indices must not depend on stochastic variables.\n",
    "\n",
    "Previously, a BUGS to Julia AST translation program has been implemented and well documented (Cf. [SymbolicPPL.jl]https://github.com/TuringLang/SymbolicPPL.jl), this report will focus on how to compiler the Julia ASTs into a graphical representation defined at [GraphPPL](https://github.com/TuringLang/AbstractPPL.jl/blob/main/src/graphinfo.jl)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef9dbd61-ccff-4da1-b0ed-d9f1ab7de0b6",
   "metadata": {
    "tags": []
   },
   "source": [
    "## The Need for Partial Evaluation\n",
    "As we discussed before, legal BUGS programs describe Directed Acyclic Graphical(DAG) models, and one can understand assignments in BUGS programs as describing edges in the DAGs. There are no chronological orders between all the assignments, which means, if we want to analyze if a certain variable is dependent on some other variables, the analysis has to consider all the assignments in the program.  \n",
    "\n",
    "Our approaches rely on [Symbolics.jl](https://symbolics.juliasymbolics.org/dev/). The intuition behind our solution is that the RHS of a logical assignment can be constructed into a symbolic term, while the LHS of that logical assignments are potentially a variable referred in other symbolic terms. By utilizing [Symbolic.subsitute](https://symbolics.juliasymbolics.org/dev/manual/expression_manipulation/#SymbolicUtils.substitute) function, we can do partial evaluation of any variable. And if the model is fully specified, we can evaluate the variable to a concrete number. To test if a model is correctly specified, we only need to check if all the loop bounds and array indices can be resolved to a concrete number by only considering logical assignments.  \n",
    "\n",
    "In the remaining of this chapter, we will detail some aspects of the implementation. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de3b42a7-abab-4c90-ac3f-6bdd9870faa6",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Logical Assignments\n",
    "\n",
    "Let's consider a simple logical assignment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1232c3fd-779d-4537-aa55-706d8aa920d2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quote\n",
       "    a = b + c * 3\n",
       "    b = 2\n",
       "    c = 0.5\n",
       "end"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using SymbolicPPL\n",
    "using SymbolicPPL: CompilerState, addlogicalrules!, addstochasticrules!, unrollforloops!, tosymbolic, resolve\n",
    "\n",
    "ex = bugsmodel\"\"\"\n",
    "    a <- b + c * 3\n",
    "    b <- 2\n",
    "    c <- 0.5\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73d12e0e-9e19-490e-80f8-685cbc5e56fe",
   "metadata": {},
   "source": [
    "In the code block above, `bugsmodel` translates a BUGS program into Julia AST showed in the output. The function `addlogicalrules!` is the workhorse that process logical assignments. What `addlogicalrules!` does is, roughly, for every logical assignment, create a mapping between the LHS symbolic variable and RHS symbolic term, and this mapping is stored as a dictionary. Let's see the function at work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "50aeddd9-3e77-4681-b1a1-e4508b29dacd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dict{Symbolics.Num, Symbolics.Num} with 3 entries:\n",
       "  a => b + 3c\n",
       "  c => 0.5\n",
       "  b => 2"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compiler_state = CompilerState() # keep information during te compilation process\n",
    "addlogicalrules!(ex, compiler_state)\n",
    "compiler_state.logicalrules"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c0f12ce-68cc-41f3-90e4-0d7459c9c2af",
   "metadata": {},
   "source": [
    "Then we can ask: what is the value of variable `a`, partially evaluated in the context of all logical assignments? The answer will be "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "9333148d-946f-4497-a2a0-84fa8cbd37c1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3.5"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resolve(tosymbolic(:a), compiler_state) # resolve(::Num, ::CompilerState) is a wrapper around Symbolics.substitut that does the partial evaluation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39263968-611d-47b3-9db9-d8cddafa986b",
   "metadata": {},
   "source": [
    "and it's exactly what we expect."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0308a783-c090-45c7-a762-7121c1f84376",
   "metadata": {},
   "source": [
    "## Stochastic Assignments\n",
    "\n",
    "Stochastic assignments are handled in a similar way. BUGS syntax requires the RHS of a stochastic assignment to be a probability distribution function. And to facilitate the downstream translation to `GraphPPL.Model`, the \"rule dictionary\" for stochastic assignments is a mapping from symbolic variables to distribution functions.  \n",
    "Enough talk, let's see how it works. The function that handle stochastic assignments is called `addstochasticrules!`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "28f21b5b-436b-4045-8135-beafb3fcf5c3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quote\n",
       "    $(Expr(:~, :y, :(dnorm(x, 1))))\n",
       "end"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ex = bugsmodel\"\"\"\n",
    "    y ~ dnorm(x, 1)\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "cdf6d459-e2af-4d7d-b126-cbe78f812f0a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dict{Symbolics.Num, Any} with 1 entry:\n",
       "  y => #54"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "addstochasticrules!(ex, compiler_state)\n",
    "compiler_state.stochasticrules"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "205bb6ca-b17b-4e43-9d28-0c94bf30b107",
   "metadata": {},
   "source": [
    "`#54` here is quite cryptic, with [@code_lowered](https://docs.julialang.org/en/v1/stdlib/InteractiveUtils/#InteractiveUtils.@code_lowered), we can see how the function is defined. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "83c6170f-dc57-4064-a7b6-04acaac38f51",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CodeInfo(\n",
       "\u001b[90m1 ─\u001b[39m %1 = SymbolicPPL.dnorm(x, 1)\n",
       "\u001b[90m└──\u001b[39m      return %1\n",
       ")"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "func = compiler_state.stochasticrules[tosymbolic(:y)] # `tosymbolic` convert a Symbol into a symbolic variable\n",
    "@code_lowered func(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8a63404-bee5-4fd2-bc75-31c4d56f143e",
   "metadata": {},
   "source": [
    "It's nothing more than a function call to the `dnorm` function. Internally, we define these functions using [Distributions.jl](https://juliastats.org/Distributions.jl/stable/), to see this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "69ea3a65-f34c-4f0f-8c79-c9ef8211df06",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Distributions.Normal{Float64}(μ=0.0, σ=1.0)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "func(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "608e00a5-93f8-44df-b53d-e95049a6f603",
   "metadata": {},
   "source": [
    "## Array \n",
    "The challenge of support BUGS' array interface is that every element of an array can be either logical or stochastic, so we need to treat every array element as a separate variable. We won't go deep into the inner mechanism, rather let's see how array indexing work with some demos."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "3f46d2d0-a14d-44cf-980e-af1ca98f7816",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ex = quote\n",
      "    g[1] = 2 * h[2, 3]\n",
      "end\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Dict{Symbolics.Num, Symbolics.Num} with 1 entry:\n",
       "  var\"g[1]\" => 2var\"h[2, 3]\""
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ex = bugsmodel\"\"\"\n",
    "    g[1] <- 2 * h[2, 3]\n",
    "\"\"\"\n",
    "@show ex\n",
    "\n",
    "compiler_state = CompilerState()\n",
    "addlogicalrules!(ex, compiler_state)\n",
    "compiler_state.logicalrules"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2be7acb9-2e04-4423-acd4-4387d2e8f60c",
   "metadata": {},
   "source": [
    "The array indexing is transferred into a variable with an easy-to-identify name (`h[2,3]` -> `var\"h[2, 3]\"`).  \n",
    "One thing worth noting is, the compiler will infer the size of the array, so that,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "1d0489de-21d6-49b6-be7a-8b7fa8f3a22e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dict{Symbolics.Num, Symbolics.Num} with 2 entries:\n",
       "  x         => (2//3)*var\"h[2, 1]\" + (2//3)*var\"h[2, 2]\" + (2//3)*var\"h[2, 3]\"\n",
       "  var\"g[1]\" => 2var\"h[2, 3]\""
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ex = bugsmodel\"\"\"\n",
    "    x <- 2 * mean(h[2, :]) # slicing h\n",
    "\"\"\"\n",
    "addlogicalrules!(ex, compiler_state)\n",
    "compiler_state.logicalrules"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0b917cb2-7f9a-426c-bce0-b1d16c74adf1",
   "metadata": {},
   "source": [
    "The compiler will infer the size of `h` to be (2, 3), because that's the largest indices seen. We can also see `mean()` function is interpreted and simplified.  \n",
    "Indexing an array with other array indexing is also supported."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "5e8a3064-f3c7-4704-8724-6db650e99fbc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dict{Symbolics.Num, Symbolics.Num} with 2 entries:\n",
       "  var\"k[1]\" => 3\n",
       "  var\"g[3]\" => 2"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ex = bugsmodel\"\"\"\n",
    "    g[k[1]] <- 2\n",
    "    k[1] <- 3\n",
    "\"\"\"\n",
    "compiler_state = CompilerState()\n",
    "addlogicalrules!(ex, compiler_state) # The first one add `k[1]` \n",
    "addlogicalrules!(ex, compiler_state) # so the index can be evaluated now\n",
    "compiler_state.logicalrules"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8fadbce-6039-43e0-abfb-84813702a73a",
   "metadata": {},
   "source": [
    "## Unrolling\n",
    "Internally, we unroll all the loops. There are several reasons for this decision:\n",
    "1. If we can unroll all the loops, then all the expression are either logical or stochastic assignments, and we have established that we can handle them well.\n",
    "2. For a BUGS program to be legal, the loop bounds would need to be evaluated to a concrete number. Otherwise, either the model is under-specified, or the loop bound is dependent on stochastic variables.\n",
    "\n",
    "`unrollforloops!` handles the unrolling."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2c839bce-683c-4996-9b15-faa44c803602",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quote\n",
       "    var\"array.variable.0\"[1] = 1 + 1\n",
       "    var\"array.variable.2\"[1, 1, 1] = 2\n",
       "    var\"array.variable.2\"[1, 2, 1] = 2\n",
       "    var\"array.variable.2\"[1, 2, 2] = 2\n",
       "    var\"array.variable.0\"[2] = 2 + 1\n",
       "    var\"array.variable.2\"[2, 1, 1] = 2\n",
       "    var\"array.variable.2\"[2, 2, 1] = 2\n",
       "    var\"array.variable.2\"[2, 2, 2] = 2\n",
       "    var\"array.variable.0\"[3] = 3 + 1\n",
       "    var\"array.variable.2\"[3, 1, 1] = 2\n",
       "    var\"array.variable.2\"[3, 2, 1] = 2\n",
       "    var\"array.variable.2\"[3, 2, 2] = 2\n",
       "    N = 3\n",
       "end"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ex = bugsmodel\"\"\"      \n",
    "    # nested loop\n",
    "    for (i in 1:N) {\n",
    "        # assignment using loop variable\n",
    "        array.variable.0[i] <- i + 1\n",
    "        \n",
    "        # nested loops in another for loop\n",
    "        for (j in 1:2) {\n",
    "            # loop bound depend on loop variable\n",
    "            for (k in 1:j) {\n",
    "                array.variable.2[i, j, k] = 2\n",
    "            }\n",
    "        }\n",
    "    }\n",
    "\n",
    "    N <- 3\n",
    "\"\"\"\n",
    "compiler_state = CompilerState()\n",
    "addlogicalrules!(ex, compiler_state) # Add `N => 3` to the rules\n",
    "unrollforloops!(ex, compiler_state)\n",
    "ex"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25cd320b-f773-4b65-80d4-aa95d33c9979",
   "metadata": {},
   "source": [
    "## Constructing `GraphPPL.Model` \n",
    "Once the program is fully unrolled, we can use `addlogicalrules!` and `addstochasticrules!` to capture all the assignments into \"rules\", and then all that's left is using all the rules to construct a `GraphPPL.Model`.  \n",
    "Every variable in the BUGS program will correspond to a node in the graphical model. And for every node, four pieces of information is needed: name, default value, node function, and node type. \n",
    "* **Name**: the variable name; \n",
    "* **Default Value**: for variables that can be resolved to a concrete value, the value is used as the default value, otherwise 0 is used\n",
    "* **Node Function**: the node functions of a stochastic variable is the corresponding distribution function in `stochasticrules`, and the node function for a logical variable is generate by [Symbolics.build_function()](https://symbolics.juliasymbolics.org/dev/manual/build_function/#Symbolics.build_function) using the RHS symbolic term\n",
    "* **Node Type**: logical variables are always of type `Logical`, while for a stochastic variable, if the variable is defined by data, then it is an `Observation` node, otherwise, it is a `Stochastic` type "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f6e177f-b1b4-4569-97f0-c6ddb3afadde",
   "metadata": {},
   "source": [
    "## An Example\n",
    "We will use [Pumps](https://chjackson.github.io/openbugsdoc/Examples/Pumps.html) as an example."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12c45305-8147-4522-bdb9-091b989b152a",
   "metadata": {},
   "source": [
    "The model is defined as "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "21f0f1cc-c0b4-4206-b654-83f4979ea19d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quote\n",
       "    for i = 1:N\n",
       "        $(Expr(:~, :(theta[i]), :(dgamma(alpha, beta))))\n",
       "        lambda[i] = theta[i] * t[i]\n",
       "        $(Expr(:~, :(x[i]), :(dpois(lambda[i]))))\n",
       "    end\n",
       "    $(Expr(:~, :alpha, :(dexp(1))))\n",
       "    $(Expr(:~, :beta, :(dgamma(0.1, 1.0))))\n",
       "end"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_def = bugsmodel\"\"\"\n",
    "   for (i in 1 : N) {\n",
    "      theta[i] ~ dgamma(alpha, beta)\n",
    "      lambda[i] <- theta[i] * t[i]\n",
    "      x[i] ~ dpois(lambda[i])\n",
    "   }\n",
    "   \n",
    "   alpha ~ dexp(1)\n",
    "   beta ~ dgamma(0.1, 1.0)\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0322fc78-5c03-46e9-b17d-7810802e7962",
   "metadata": {},
   "source": [
    "Then we introduce the data -- data are just logical assignments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "faec1ce4-721b-46da-ade0-ed34e51fe8d0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dict{Symbolics.Num, Symbolics.Num} with 21 entries:\n",
       "  var\"t[6]\"  => 31.4\n",
       "  var\"x[4]\"  => 14\n",
       "  var\"x[9]\"  => 4\n",
       "  var\"x[3]\"  => 5\n",
       "  var\"x[1]\"  => 5\n",
       "  var\"t[7]\"  => 1.05\n",
       "  var\"x[6]\"  => 19\n",
       "  N          => 10\n",
       "  var\"t[9]\"  => 2.1\n",
       "  var\"t[1]\"  => 94.3\n",
       "  var\"t[4]\"  => 126.0\n",
       "  var\"t[8]\"  => 1.05\n",
       "  var\"x[2]\"  => 1\n",
       "  var\"x[8]\"  => 1\n",
       "  var\"t[5]\"  => 5.24\n",
       "  var\"x[5]\"  => 3\n",
       "  var\"x[10]\" => 22\n",
       "  var\"t[2]\"  => 15.7\n",
       "  var\"t[10]\" => 10.5\n",
       "  var\"x[7]\"  => 1\n",
       "  var\"t[3]\"  => 62.9"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = (\n",
    "    t = [94.3, 15.7, 62.9, 126, 5.24, 31.4, 1.05, 1.05, 2.1, 10.5],\n",
    "    x = [5, 1, 5, 14, 3, 19, 1, 1, 4, 22],\n",
    "    N = 10,\n",
    ")\n",
    "compiler_state = CompilerState()\n",
    "addlogicalrules!(data, compiler_state)\n",
    "compiler_state.logicalrules"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "595d0ea0-6c8a-47e3-960a-8cd93185b297",
   "metadata": {},
   "source": [
    "The next step is alternating the unrolling and processing of logical assignments. At the end of this stage, all expressions in the model definition should be either logical or stochastic assignments. Let's see"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "0be4b690-1129-4534-a00e-74884f373985",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "quote\n",
       "    $(Expr(:~, :(theta[1]), :(dgamma(alpha, beta))))\n",
       "    lambda[1] = theta[1] * t[1]\n",
       "    $(Expr(:~, :(x[1]), :(dpois(lambda[1]))))\n",
       "    $(Expr(:~, :(theta[2]), :(dgamma(alpha, beta))))\n",
       "    lambda[2] = theta[2] * t[2]\n",
       "    $(Expr(:~, :(x[2]), :(dpois(lambda[2]))))\n",
       "    $(Expr(:~, :(theta[3]), :(dgamma(alpha, beta))))\n",
       "    lambda[3] = theta[3] * t[3]\n",
       "    $(Expr(:~, :(x[3]), :(dpois(lambda[3]))))\n",
       "    $(Expr(:~, :(theta[4]), :(dgamma(alpha, beta))))\n",
       "    lambda[4] = theta[4] * t[4]\n",
       "    $(Expr(:~, :(x[4]), :(dpois(lambda[4]))))\n",
       "    $(Expr(:~, :(theta[5]), :(dgamma(alpha, beta))))\n",
       "    lambda[5] = theta[5] * t[5]\n",
       "    $(Expr(:~, :(x[5]), :(dpois(lambda[5]))))\n",
       "    $(Expr(:~, :(theta[6]), :(dgamma(alpha, beta))))\n",
       "    lambda[6] = theta[6] * t[6]\n",
       "    $(Expr(:~, :(x[6]), :(dpois(lambda[6]))))\n",
       "    $(Expr(:~, :(theta[7]), :(dgamma(alpha, beta))))\n",
       "    lambda[7] = theta[7] * t[7]\n",
       "    $(Expr(:~, :(x[7]), :(dpois(lambda[7]))))\n",
       "    $(Expr(:~, :(theta[8]), :(dgamma(alpha, beta))))\n",
       "    lambda[8] = theta[8] * t[8]\n",
       "    $(Expr(:~, :(x[8]), :(dpois(lambda[8]))))\n",
       "    $(Expr(:~, :(theta[9]), :(dgamma(alpha, beta))))\n",
       "    lambda[9] = theta[9] * t[9]\n",
       "    $(Expr(:~, :(x[9]), :(dpois(lambda[9]))))\n",
       "    $(Expr(:~, :(theta[10]), :(dgamma(alpha, beta))))\n",
       "    lambda[10] = theta[10] * t[10]\n",
       "    $(Expr(:~, :(x[10]), :(dpois(lambda[10]))))\n",
       "    $(Expr(:~, :alpha, :(dexp(1))))\n",
       "    $(Expr(:~, :beta, :(dgamma(0.1, 1.0))))\n",
       "end"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "while true\n",
    "    unrollforloops!(model_def, compiler_state) || addlogicalrules!(model_def, compiler_state) || break\n",
    "end\n",
    "model_def"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b7addd1-72d7-4de9-8fc4-9723b153dc5f",
   "metadata": {},
   "source": [
    "After that, process stochastic assignments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "b1bbdbd5-bb87-4bc8-b050-41443b72e81d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "CodeInfo(\n",
       "\u001b[90m1 ─\u001b[39m %1 = SymbolicPPL.dgamma(0.1, 1.0)\n",
       "\u001b[90m└──\u001b[39m      return %1\n",
       ")"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "addstochasticrules!(model_def, compiler_state)\n",
    "f = compiler_state.stochasticrules[tosymbolic(:beta)]\n",
    "@code_lowered f()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b52837b-3eee-4502-a11f-7d1d72b01cce",
   "metadata": {},
   "source": [
    "Finally, we can create a `GraphPPL.Model` using information in the `CompilerState` object (not shown).  \n",
    "\n",
    "We provide a function to carry out the whole compilation process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "7016e73f-30c6-4c0c-ab8f-d53574159c89",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Nodes: \n",
       "t[9] = (input = (), value = Base.RefValue{Float64}(2.1), eval = SymbolicPPL.var\"#166#167\"(), kind = :Logical)\n",
       "t[5] = (input = (), value = Base.RefValue{Float64}(5.24), eval = SymbolicPPL.var\"#186#187\"(), kind = :Logical)\n",
       "beta = (input = (), value = Base.RefValue{Float64}(10.0), eval = SymbolicPPL.var\"#142#143\"(), kind = :Stochastic)\n",
       "t[10] = (input = (), value = Base.RefValue{Float64}(10.5), eval = SymbolicPPL.var\"#198#199\"(), kind = :Logical)\n",
       "t[8] = (input = (), value = Base.RefValue{Float64}(1.05), eval = SymbolicPPL.var\"#178#179\"(), kind = :Logical)\n",
       "t[3] = (input = (), value = Base.RefValue{Float64}(62.9), eval = SymbolicPPL.var\"#204#205\"(), kind = :Logical)\n",
       "t[7] = (input = (), value = Base.RefValue{Float64}(1.05), eval = SymbolicPPL.var\"#160#161\"(), kind = :Logical)\n",
       "t[1] = (input = (), value = Base.RefValue{Float64}(94.3), eval = SymbolicPPL.var\"#172#173\"(), kind = :Logical)\n",
       "t[2] = (input = (), value = Base.RefValue{Float64}(15.7), eval = SymbolicPPL.var\"#196#197\"(), kind = :Logical)\n",
       "N = (input = (), value = Base.RefValue{Float64}(10.0), eval = SymbolicPPL.var\"#164#165\"(), kind = :Logical)\n",
       "t[4] = (input = (), value = Base.RefValue{Float64}(126.0), eval = SymbolicPPL.var\"#174#175\"(), kind = :Logical)\n",
       "t[6] = (input = (), value = Base.RefValue{Float64}(31.4), eval = SymbolicPPL.var\"#144#145\"(), kind = :Logical)\n",
       "alpha = (input = (), value = Base.RefValue{Float64}(10.0), eval = SymbolicPPL.var\"#140#141\"(), kind = :Stochastic)\n",
       "theta[7] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#124#125\"(), kind = :Stochastic)\n",
       "lambda[7] = (input = (Symbol(\"t[7]\"), Symbol(\"theta[7]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#184#185\"(), kind = :Logical)\n",
       "x[7] = (input = (Symbol(\"lambda[7]\"),), value = Base.RefValue{Float64}(1.0), eval = SymbolicPPL.var\"#126#127\"(), kind = :Observations)\n",
       "theta[6] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#120#121\"(), kind = :Stochastic)\n",
       "lambda[6] = (input = (Symbol(\"t[6]\"), Symbol(\"theta[6]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#148#149\"(), kind = :Logical)\n",
       "x[6] = (input = (Symbol(\"lambda[6]\"),), value = Base.RefValue{Float64}(19.0), eval = SymbolicPPL.var\"#122#123\"(), kind = :Observations)\n",
       "theta[4] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#112#113\"(), kind = :Stochastic)\n",
       "lambda[4] = (input = (Symbol(\"t[4]\"), Symbol(\"theta[4]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#200#201\"(), kind = :Logical)\n",
       "x[4] = (input = (Symbol(\"lambda[4]\"),), value = Base.RefValue{Float64}(14.0), eval = SymbolicPPL.var\"#114#115\"(), kind = :Observations)\n",
       "theta[9] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#132#133\"(), kind = :Stochastic)\n",
       "lambda[9] = (input = (Symbol(\"t[9]\"), Symbol(\"theta[9]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#170#171\"(), kind = :Logical)\n",
       "x[9] = (input = (Symbol(\"lambda[9]\"),), value = Base.RefValue{Float64}(4.0), eval = SymbolicPPL.var\"#134#135\"(), kind = :Observations)\n",
       "theta[1] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#100#101\"(), kind = :Stochastic)\n",
       "lambda[1] = (input = (Symbol(\"t[1]\"), Symbol(\"theta[1]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#168#169\"(), kind = :Logical)\n",
       "x[1] = (input = (Symbol(\"lambda[1]\"),), value = Base.RefValue{Float64}(5.0), eval = SymbolicPPL.var\"#102#103\"(), kind = :Observations)\n",
       "theta[8] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#128#129\"(), kind = :Stochastic)\n",
       "lambda[8] = (input = (Symbol(\"t[8]\"), Symbol(\"theta[8]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#190#191\"(), kind = :Logical)\n",
       "x[8] = (input = (Symbol(\"lambda[8]\"),), value = Base.RefValue{Float64}(1.0), eval = SymbolicPPL.var\"#130#131\"(), kind = :Observations)\n",
       "theta[5] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#116#117\"(), kind = :Stochastic)\n",
       "lambda[5] = (input = (Symbol(\"t[5]\"), Symbol(\"theta[5]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#150#151\"(), kind = :Logical)\n",
       "x[5] = (input = (Symbol(\"lambda[5]\"),), value = Base.RefValue{Float64}(3.0), eval = SymbolicPPL.var\"#118#119\"(), kind = :Observations)\n",
       "theta[10] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#136#137\"(), kind = :Stochastic)\n",
       "lambda[10] = (input = (Symbol(\"t[10]\"), Symbol(\"theta[10]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#176#177\"(), kind = :Logical)\n",
       "x[10] = (input = (Symbol(\"lambda[10]\"),), value = Base.RefValue{Float64}(22.0), eval = SymbolicPPL.var\"#138#139\"(), kind = :Observations)\n",
       "theta[2] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#104#105\"(), kind = :Stochastic)\n",
       "lambda[2] = (input = (Symbol(\"t[2]\"), Symbol(\"theta[2]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#192#193\"(), kind = :Logical)\n",
       "x[2] = (input = (Symbol(\"lambda[2]\"),), value = Base.RefValue{Float64}(1.0), eval = SymbolicPPL.var\"#106#107\"(), kind = :Observations)\n",
       "theta[3] = (input = (:alpha, :beta), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#108#109\"(), kind = :Stochastic)\n",
       "lambda[3] = (input = (Symbol(\"t[3]\"), Symbol(\"theta[3]\")), value = Base.RefValue{Float64}(0.0), eval = SymbolicPPL.var\"#152#153\"(), kind = :Logical)\n",
       "x[3] = (input = (Symbol(\"lambda[3]\"),), value = Base.RefValue{Float64}(5.0), eval = SymbolicPPL.var\"#110#111\"(), kind = :Observations)\n"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = compile_graphppl(model_def = model_def, data = data, initials = (alpha = 10, beta = 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0de3e375-7e1f-458d-9993-31fea540be08",
   "metadata": {},
   "source": [
    "## Inference on Graph\n",
    "We implemented a simple Metropolis-within-Gibbs sampler, given that inference algorithm is not the focus of this report, we'll just demonstrate the result. Curious reader can refer to our [implementation](https://github.com/TuringLang/SymbolicPPL.jl/blob/complier/src/gibbs.jl), and even better, we'll appreciate contributions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "af4c4065-e77b-45fa-a9d7-adffd6904da3",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[32mSampling: 100%|█████████████████████████████████████████| Time: 0:00:29\u001b[39m\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Summary Statistics\n",
       " \u001b[1m parameters \u001b[0m \u001b[1m    mean \u001b[0m \u001b[1m     std \u001b[0m \u001b[1m naive_se \u001b[0m \u001b[1m    mcse \u001b[0m \u001b[1m       ess \u001b[0m \u001b[1m    rhat \u001b[0m\n",
       " \u001b[90m     Symbol \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m  Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m   Float64 \u001b[0m \u001b[90m Float64 \u001b[0m\n",
       "\n",
       "       alpha    0.7013    0.2740     0.0026    0.0162    149.7799    1.0279\n",
       "        beta    0.9289    0.5847     0.0056    0.0457     79.2040    1.0474\n",
       "   theta[10]    2.0024    0.4391     0.0042    0.0152    745.3547    0.9999\n",
       "    theta[1]    0.0598    0.0247     0.0002    0.0007    932.8279    1.0004\n",
       "    theta[2]    0.1030    0.0814     0.0008    0.0020   1662.7283    1.0004\n",
       "    theta[3]    0.0886    0.0377     0.0004    0.0012   1046.4324    1.0021\n",
       "    theta[4]    0.1166    0.0298     0.0003    0.0011    706.8311    1.0028\n",
       "    theta[5]    0.6034    0.3148     0.0030    0.0055   3489.0945    1.0014\n",
       "    theta[6]    0.6093    0.1351     0.0013    0.0038   1327.1002    1.0000\n",
       "    theta[7]    0.9094    0.7428     0.0071    0.0128   3814.7638    1.0013\n",
       "    theta[8]    0.9005    0.7434     0.0071    0.0136   2595.2416    1.0038\n",
       "    theta[9]    1.6012    0.7786     0.0074    0.0225    931.0329    1.0045\n"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "using SymbolicPPL: SampleFromPrior\n",
    "using AbstractMCMC\n",
    "using MCMCChains\n",
    "using MCMCChains: summarize\n",
    "\n",
    "sampler = SampleFromPrior(model);\n",
    "samples = AbstractMCMC.sample(model, sampler, 11000, discard_initial = 1000);\n",
    "summarize(samples[[namesingroup(samples, :theta)..., :alpha, :beta]])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c284636c-9f9a-4acf-9723-71f52a0d12cf",
   "metadata": {},
   "source": [
    "# PRs Supporting GSoC Evaluation\n",
    "* [MiniTuring](https://github.com/TuringLang/TuringTutorials/pull/318) Turorial\n",
    "* [Compiler](https://github.com/TuringLang/SymbolicPPL.jl/pull/10) for BUGS programs (repo not published yet)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b209cb95-ca2b-4b53-a611-2e6d33357dab",
   "metadata": {},
   "source": [
    "# Future Work\n",
    "1. The compiler still need to support all the functions and distributions of BUGS\n",
    "2. Performance improvement\n",
    "3. Inference algorithms"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c8cccb8-cbfb-4dde-80f2-11a193c5ca71",
   "metadata": {},
   "source": [
    "# Acknowledgment\n",
    "Much appreciation goes to my mentors Ge Hong and Philipp Gabler. The work is impossible without your help and support. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.7.3",
   "language": "julia",
   "name": "julia-1.7"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}