pat-hanbury/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Introduction to Monte Carlo with Python

What is a Monte Carlo Method?

-----Need source
Monte Carlo methods are widely used heuristic techniques which can solve a variety of common problems including optimization and numerical integration problems. These algorithms work by cleverly sampling from a distribution to simulate the workings of a system. Applications range from solving problems in computational physics to predicting trends in financial investments.
Our Problem

In this introduction, we will develop a Python implementation of Monte Carlo approximation to find a solution to the following integral:
[insert integral here].
I'm borrowing this example and it's solution from Sunkanta Deb's Variational Monte Carlo Technique article. If you're looking to learn more about simulating Quantum Mechanical systems using Monte Carlos, definitely check out that article.
You'll notice that this integral cannot be solved analytically, and therefore we will need to approximate it numerically. Let's start with a simple approach to the problem: the Crude Monte Carlo.
The Crude Monte Carlo: Intuition

The Crude Monte Carlo is the easiest technique to understand, so we'll start here. But as the name suggests, it's also the least accurate. We'll fix this later though.
You may recall from high school calculus the following identity, which describes how the average value of a function in a specified range relates to it's integral over that range (for a proof, check out this link):
[Insert Identity]
Just as we can find the average value of a function by integrating, we can also find the value of an integral by determining the average value of it's integrand, f(x). The Monte Carlo technique is built upon this principle: instead of evaluating an indefinite integral, which can sometimes be impossible, let's instead estimate the average of the integrand and use that to approximate the integral.
And that's exactly what we're going to do! So how do we do that?
Well, the easiest way to do it would be to randomly sample input values x from all possible input values. Let's say we have a simple linear function, like y = 2x, and we want to find the average value of y in the range [0,2].
[Insert picture of graph and randomly sampled points in table, side by side. Show addition].
To calculate the average, we'll just evaluate y at all randomly determined x and average the results.
[show addition]
This process is exactly the Crude Monte Carlo. Now let's see how use this technique in Python to solve our problem from earlier.
The Crude Monte Carlo: Implementation

You can find all of the code for this tutorial on my Github here. Let's start by gathering our  script with imports and the like:
https://gist.github.com/3f43b1f6739614355d76c02a29c42185
We really don't need much. numpy will be used once to find the argmin of a list. We'll use math for defining our functions and random will be used for sampling. matplotlib will help visualize our results as we go.
Some Helpful Functions

First let's define a function to generate a random number in a particular range.
https://gist.github.com/0d2d3b283b5e5ff8b53f11d78c8d300a
Let's also define our integrand function, f(x):
https://gist.github.com/5c11adde677fa6f4cfbc9e159a40728b
Perform the Crude Monte Carlo

Implementing the Crude Monte Carlo should be fairly straightforward. Our algorithm looks like this:

For each sample in number of samples:

Get a random input value from the integration range
Evaluate the integrand


Determine the average of all these evaluations and multiply by the range, and return

https://gist.github.com/fe93e4fed08736175bd292529cc76efc
Performing this approximation with N=10000 samples gave me an estimate of 0.68971.... Not too far off from Wolfram's approximation (which we'll take to be the Holy Truth).
Determine the Variance of Our Estimation

But how confident are we in our answer? How do I know that 10,000 samples is enough to get a good approximation. We can quantify our accuracy by finding the variance of our estimations. [DEFINE AND SOURCE VARIANCE]. The formula for calculating variance is:
[Insert Formula for calculating Variance]
Let's see how to do this in Python.
https://gist.github.com/b2c028b1cd0336fb7501a144e84c01d5
This implementation of the Crude Monte Carlo gave me a variance of 0.37580.... For a quite and dirty estimate, this isn't bad at all, but if we need a lot more accuracy? We could always just increase the number of samples, but then our computation time will increase as well. What if, instead of using random sampling, we cleverly sampled from the right distribution of points.... something like, importance sampling.
Importance Sampling: The Intuition

Importance sampling is a method for reducing the variance of a Monte Carlo simulation given a constant number of samples. The idea is that instead of randomly sampling from the whole function, let's just sample from the a distribution of points of similar shape to the function.
Let's look at a simple example. Let's say you have a piece-wise function that looks like this:
[Insert graph of step function]
Shown above we have a step function active on the range [0,2] and inactive from [2,6]. Sampling 10 times might yield estimates like this:
[Insert Table showing estimates of 3 samples in active range and 7 samples in active]
These samples, which correspond to the most likely distribution of samples, would yield an integral estimation of 1.8. But, what if instead, we estimate the ratio between our function f(x) and some special weight function g(x) whose value is almost always about 1/2 the value of f(x) for any given x? What if we also bias our samples to appear in the most active ranges of our function (which we'll find to minimize the error). You'll see that if that the average of these ratios is a lot closer the real value of our integral (2).
The importance sampling method I will now outline seeks to determine this optimal function g(x).
The Math

I will provide a quick overview of how this works mathematically, but the main purpose of this post is the implementation, so if you desire more mathematically rigor, check out Professor Deb's article I mentioned earlier.
Let's see if we can find a g(x) such that:
[f(x)/g(x) ~= 1 for most x in range [a,b] ]
We'll also need g(x) to satisify a few criteria.

g(x) is integratable
g(x) is non-negative on [a,b]
The indefinte integral of g(x), (we'll call G(x) ) has a real inverse
Integral of g(x) in the range [a,b] must equal 1

In ideal case, f(x) = k * g(x), where k is a constant. If that were the case, we could sample both functions, find the ratio between f(x) and g(x), multiply by k and we'd have a great estimate of the average value of f(x). However, if f(x) = k * g(x), then f(x) would be integratable and we would have no need to perform a monte carlo simultion; we could just solve the problem analytically!
So, we'll settle for f(x) ~= k * g(x).  We won't get a perfect estimate of course, but you'll find it performs better than our crude estimation from earlier.
We'll define G(x) as follows, and we'll also perform a change of variables to r.
[Insert G(x) definitions]
r will be restricted to the range [0,1]. Since the integral of g(x) was defined to be 1, G(x) can never be greater than 1, and therefore r can never be greater than one. This is important because later, we will randomly sample from r in the range [0,1] when performing the simulation.
Using these definitions, we can produce the following estimation:
[ Show I = integral = ect...]
Simply right? Don't be intimated if this doesn't make sense at first glance. I intentionally focused on the intuition and breezed through the math quite a bit. If you're confused, or you like me and need to see a proof of everything, check out the recourse I talked about earlier until you can believe that final equation.
Importance Sampling: Python Implementation

Ok, now that we understand the math behind importance sampling, let's go back to our example from before. Remember, we're trying to estimate the following integral as precisely as we can:
[Insert integral we're interested in]
Visualizing our Problem

Let's start by envisioning a template for our g(x) function. I'd like to visualize my function f(x), so we'll do that using matplotlib:
https://gist.github.com/0e97a7e71c7de64ff396b8476324484e
[Insert Graph produced by this code]
Ok, so we can see that our function is mostly active in the rough range of [0,3-ish] and is mostly inactive on the range [3-ish, inf]. So, let's see if we can find a function template can be parametrized to replicate this quality. Deb's proposes this function:
[insert g(x) definition]
Plotting this function with what we will eventually find to be the ideal parameters lambda and A will produce the following graph (overlayed with f(x)):
[Insert graph of both g(x) and f(x)]
You can see that in many ways g(x) does not ideally replicate the shape of f(x). This is ok. A crude g(x) can still do marvels for decreasing your estimations variance. Feel free to experiment with other weight functions g(x) to see if you can find even better solutions.
Parametrize g(x)

Before we can perform the simulation, we will need to find the optimal parameters lamda and A. We can find A(lamda) using the a restriction on g(x):
[insert restriction and A(lamda)].
Now, all we need to do is find the ideal lamda, and we'll have our ideal g(x).
To do this, let's calculate the variance for different lamdas on the range [0.05,3.0] in increments of 0.5. Recall the equation to calculate the variance looks like this:
[Insert relevant variance equation]
In code, it looks like this:
https://gist.github.com/aa068ace0a0121cc59884b83ffb59031
Running this code in Jupyter Lab produce the following output (check out the Github repo to see the whole code).
[Insert screenshot of the Notebook]
You'll see that running this optimization code using 10,000 samples produce a lamda value of 1.35 , an A value of 1.3697..., and a variance of 0.0513.... Wow! Using importance sampling allowed us to reduce our variance by almost a factor of 8 using the same number of samples.
Run the Simulation

Now, all we have to do is run the simulation with our optimized g(x) function, and we're good to go. Here's what it looks like in code:
https://gist.github.com/bbf7989c8643cef494d27d985ac58808
Running this code gave me an approximation of 0.6951... which is much closer to the Wolfram-provided grand truth of 0.69....
Wrapping Things Up

In this tutorial, we learned how to perform Monte Carlo Simulation for estimating a definite integral. We used both the crude method and the importance sampling method, and found that importance sampling provided a significant decrease in variance. The next step to mastering the Monte Carlo is to learn the time-tested Metropolis Algorithm, which you can learn in the next part of this series.
Quick Trailer to the Metropolis Algoirthm

The metropolis algorithm has been rated one of the top 10 algorithms of the 20th century [SOURCE]. It is extremely useful when you want to use importance sampling, but there's not a g(x) function good enough for you to use as a baseline that satisfies the restrictions.
The metropolis algorithm is also much simpler than what we did. (So why didn't we just skip there? Well learning importance sampling the hard way gives you great intuition into how to use the metropolis algorithm most effectively).
Check back soon for Part 2 of this tutorial where we'll go over how to use the metropolis algorithm in your Monte Carlo simulations!
Good luck, and thanks for sticking to the end!