ruotianluo/rl-tutorial-1.ipynb

## rl-tutorial-1.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reinforcement Learning in Tensorflow Tutorial 1\n",
    "## The two-armed bandit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Bandits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we define our bandits. For this example we are using a two-armed bandit. The pullBandit function generates a random number from a normal distribution with a mean of 0. The lower the bandit number, the more likely a positive reward will be returned. We want our agent to learn to always choose the bandit that will give that positive reward."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#List out our two bandits. Current bandit 0 is the optimal choice.\n",
    "bandits = [-0.5,0.5]\n",
    "def pullBandit(bandit):\n",
    "    #Get a random number.\n",
    "    result = np.random.randn(1)\n",
    "    if result > bandit:\n",
    "        #return a positive reward.\n",
    "        return 1\n",
    "    else:\n",
    "        #return a negative reward.\n",
    "        return -1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The neural network agent\n",
    "\n",
    "Here we set up all the parameters that will be used for training our network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# hyperparameters\n",
    "learning_rate = 0.1\n",
    "gamma = 0.99 # discount factor for reward"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we define our very simple neural network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "tf.reset_default_graph()\n",
    "#While there aren't any states to this task, we will still use an x input as a placeholder.\n",
    "input_x = tf.placeholder(tf.float32, [None,1] , name=\"input_x\")\n",
    "W = tf.Variable(tf.random_normal([1,1]),name='W') # Our single variable we are training\n",
    "score = tf.matmul(input_x,W)\n",
    "probability = tf.nn.sigmoid(score) # This is the liklihood of choosing bandit 1 over bandit 0\n",
    "\n",
    "#Below we compute and set the gradients to use for adjusting the network towards a succesful policy. \n",
    "input_y = tf.placeholder(tf.float32,[None,1], name=\"input_y\")\n",
    "rewardSig = tf.placeholder(tf.float32,name=\"reward_sig\")\n",
    "\n",
    "#The computation below is the key to processing the gradients properly. \n",
    "theGrad = tf.gradients(probability,W,grad_ys= ((input_y*rewardSig)/probability) - rewardSig)\n",
    "\n",
    "adam = tf.train.AdamOptimizer(learning_rate=learning_rate)\n",
    "updateGrads = adam.apply_gradients([(theGrad[0],W)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Running the agent and environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total reward 4.000000. Action: 1.000000. Prob Before 0.143842. Prob After 0.131960.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.131960. Prob After 0.122341.\n",
      "Total reward 0.000000. Action: 0.000000. Prob Before 0.122341. Prob After 0.115324.\n",
      "Total reward 6.000000. Action: 0.000000. Prob Before 0.115324. Prob After 0.107501.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.107501. Prob After 0.099853.\n",
      "Total reward 0.000000. Action: 0.000000. Prob Before 0.099853. Prob After 0.093116.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.093116. Prob After 0.086765.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.086765. Prob After 0.080318.\n",
      "Total reward 0.000000. Action: 0.000000. Prob Before 0.080318. Prob After 0.074615.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.074615. Prob After 0.068990.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.068990. Prob After 0.063545.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.063545. Prob After 0.058583.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.058583. Prob After 0.053903.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.053903. Prob After 0.049375.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.049375. Prob After 0.045196.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.045196. Prob After 0.041266.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.041266. Prob After 0.037747.\n",
      "Total reward 8.000000. Action: 0.000000. Prob Before 0.037747. Prob After 0.034351.\n",
      "Total reward 0.000000. Action: 0.000000. Prob Before 0.034351. Prob After 0.031632.\n",
      "Total reward 10.000000. Action: 1.000000. Prob Before 0.031632. Prob After 0.028917.\n",
      "Total reward 0.000000. Action: 0.000000. Prob Before 0.028917. Prob After 0.026569.\n",
      "Total reward -2.000000. Action: 0.000000. Prob Before 0.026569. Prob After 0.024750.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.024750. Prob After 0.022972.\n",
      "Total reward 6.000000. Action: 0.000000. Prob Before 0.022972. Prob After 0.021186.\n",
      "Total reward 6.000000. Action: 0.000000. Prob Before 0.021186. Prob After 0.019408.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.019408. Prob After 0.017758.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.017758. Prob After 0.016229.\n",
      "Total reward 6.000000. Action: 0.000000. Prob Before 0.016229. Prob After 0.014770.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.014770. Prob After 0.013436.\n",
      "Total reward 6.000000. Action: 0.000000. Prob Before 0.013436. Prob After 0.012180.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.012180. Prob After 0.011061.\n",
      "Total reward 0.000000. Action: 0.000000. Prob Before 0.011061. Prob After 0.010130.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.010130. Prob After 0.009267.\n",
      "Total reward -4.000000. Action: 0.000000. Prob Before 0.009267. Prob After 0.008642.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.008642. Prob After 0.008035.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.008035. Prob After 0.007482.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.007482. Prob After 0.006977.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.006977. Prob After 0.006498.\n",
      "Total reward 6.000000. Action: 0.000000. Prob Before 0.006498. Prob After 0.006010.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.006010. Prob After 0.005570.\n",
      "Total reward 8.000000. Action: 0.000000. Prob Before 0.005570. Prob After 0.005113.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.005113. Prob After 0.004696.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.004696. Prob After 0.004325.\n",
      "Total reward 2.000000. Action: 0.000000. Prob Before 0.004325. Prob After 0.003994.\n",
      "Total reward -4.000000. Action: 0.000000. Prob Before 0.003994. Prob After 0.003756.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.003756. Prob After 0.003518.\n",
      "Total reward 4.000000. Action: 0.000000. Prob Before 0.003518. Prob After 0.003284.\n",
      "Total reward 0.000000. Action: 0.000000. Prob Before 0.003284. Prob After 0.003084.\n",
      "Total reward 0.000000. Action: 0.000000. Prob Before 0.003084. Prob After 0.002913.\n",
      "Total reward 0.000000. Action: 0.000000. Prob Before 0.002913. Prob After 0.002766.\n"
     ]
    }
   ],
   "source": [
    "xs,drs,ys = [],[],[]\n",
    "total_episodes = 500\n",
    "running_reward = None\n",
    "reward_sum = 0\n",
    "episode_number = 1\n",
    "\n",
    "init = tf.initialize_all_variables()\n",
    "\n",
    "# Launch the graph\n",
    "with tf.Session() as sess:\n",
    "    sess.run(init)\n",
    "\n",
    "    while episode_number <= total_episodes:\n",
    "        #Generate a placeholder state\n",
    "        x = np.ones([1,1])\n",
    "        xs.append(x) \n",
    "\n",
    "        # forward the policy network and sample an action from the returned probability\n",
    "        prob = sess.run(probability,feed_dict={input_x:np.ones([1,1])})\n",
    "        action = 1 if np.random.uniform() < prob else 0\n",
    "\n",
    "        y = 1 if action == 0 else 0 # a \"fake label\"\n",
    "        ys.append(y)\n",
    "        \n",
    "        # Take our action in the environment, and get a reward\n",
    "        reward = np.float64(pullBandit(bandits[action]))\n",
    "        reward_sum += reward\n",
    "        drs.append(reward) # record reward \n",
    "\n",
    "        \n",
    "        if episode_number % 10 == 0: # Periodically update the network policy\n",
    "            epx = np.vstack(xs)\n",
    "            epy = np.vstack(ys)\n",
    "            epr = np.vstack(drs)\n",
    "            xs,drs,ys = [],[],[] # reset array memory\n",
    "\n",
    "            # Update the network gradient towards choosing more ideal actions, given what it has observed.\n",
    "            probBefore = sess.run(probability, feed_dict={input_x: epx, input_y: epy, rewardSig: epr})\n",
    "            sess.run(updateGrads,feed_dict={input_x: epx, input_y: epy, rewardSig: epr})\n",
    "            probAfter = sess.run(probability, feed_dict={input_x: epx, input_y: epy, rewardSig: epr})\n",
    "\n",
    "            # Keep a record of rewards, and give some feedback\n",
    "            running_reward = reward_sum if running_reward is None else running_reward * 0.99 + reward_sum * 0.01\n",
    "            print 'Total reward %f. Action: %f. Prob Before %f. Prob After %f.' % (reward_sum, action,probBefore[0],probAfter[0])\n",
    "            reward_sum = 0\n",
    "            prev_x = None\n",
    "            \n",
    "        episode_number += 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "You should see that the agent learns to almost always choose action 0, and the probability of choosing action 1 decreases to near zero. Feel free to play with the two bandit values, and see how the agent changes what it learns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Reinforcement Learning in Tensorflow Tutorial 1\n",
	"## The two-armed bandit"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"import numpy as np\n",
	"import tensorflow as tf"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### The Bandits"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Here we define our bandits. For this example we are using a two-armed bandit. The pullBandit function generates a random number from a normal distribution with a mean of 0. The lower the bandit number, the more likely a positive reward will be returned. We want our agent to learn to always choose the bandit that will give that positive reward."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"#List out our two bandits. Current bandit 0 is the optimal choice.\n",
	"bandits = [-0.5,0.5]\n",
	"def pullBandit(bandit):\n",
	" #Get a random number.\n",
	" result = np.random.randn(1)\n",
	" if result > bandit:\n",
	" #return a positive reward.\n",
	" return 1\n",
	" else:\n",
	" #return a negative reward.\n",
	" return -1"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### The neural network agent\n",
	"\n",
	"Here we set up all the parameters that will be used for training our network."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"# hyperparameters\n",
	"learning_rate = 0.1\n",
	"gamma = 0.99 # discount factor for reward"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Next we define our very simple neural network."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"tf.reset_default_graph()\n",
	"#While there aren't any states to this task, we will still use an x input as a placeholder.\n",
	"input_x = tf.placeholder(tf.float32, [None,1] , name=\"input_x\")\n",
	"W = tf.Variable(tf.random_normal([1,1]),name='W') # Our single variable we are training\n",
	"score = tf.matmul(input_x,W)\n",
	"probability = tf.nn.sigmoid(score) # This is the liklihood of choosing bandit 1 over bandit 0\n",
	"\n",
	"#Below we compute and set the gradients to use for adjusting the network towards a succesful policy. \n",
	"input_y = tf.placeholder(tf.float32,[None,1], name=\"input_y\")\n",
	"rewardSig = tf.placeholder(tf.float32,name=\"reward_sig\")\n",
	"\n",
	"#The computation below is the key to processing the gradients properly. \n",
	"theGrad = tf.gradients(probability,W,grad_ys= ((input_y*rewardSig)/probability) - rewardSig)\n",
	"\n",
	"adam = tf.train.AdamOptimizer(learning_rate=learning_rate)\n",
	"updateGrads = adam.apply_gradients([(theGrad[0],W)])"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Running the agent and environment"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Total reward 4.000000. Action: 1.000000. Prob Before 0.143842. Prob After 0.131960.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.131960. Prob After 0.122341.\n",
	"Total reward 0.000000. Action: 0.000000. Prob Before 0.122341. Prob After 0.115324.\n",
	"Total reward 6.000000. Action: 0.000000. Prob Before 0.115324. Prob After 0.107501.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.107501. Prob After 0.099853.\n",
	"Total reward 0.000000. Action: 0.000000. Prob Before 0.099853. Prob After 0.093116.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.093116. Prob After 0.086765.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.086765. Prob After 0.080318.\n",
	"Total reward 0.000000. Action: 0.000000. Prob Before 0.080318. Prob After 0.074615.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.074615. Prob After 0.068990.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.068990. Prob After 0.063545.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.063545. Prob After 0.058583.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.058583. Prob After 0.053903.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.053903. Prob After 0.049375.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.049375. Prob After 0.045196.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.045196. Prob After 0.041266.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.041266. Prob After 0.037747.\n",
	"Total reward 8.000000. Action: 0.000000. Prob Before 0.037747. Prob After 0.034351.\n",
	"Total reward 0.000000. Action: 0.000000. Prob Before 0.034351. Prob After 0.031632.\n",
	"Total reward 10.000000. Action: 1.000000. Prob Before 0.031632. Prob After 0.028917.\n",
	"Total reward 0.000000. Action: 0.000000. Prob Before 0.028917. Prob After 0.026569.\n",
	"Total reward -2.000000. Action: 0.000000. Prob Before 0.026569. Prob After 0.024750.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.024750. Prob After 0.022972.\n",
	"Total reward 6.000000. Action: 0.000000. Prob Before 0.022972. Prob After 0.021186.\n",
	"Total reward 6.000000. Action: 0.000000. Prob Before 0.021186. Prob After 0.019408.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.019408. Prob After 0.017758.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.017758. Prob After 0.016229.\n",
	"Total reward 6.000000. Action: 0.000000. Prob Before 0.016229. Prob After 0.014770.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.014770. Prob After 0.013436.\n",
	"Total reward 6.000000. Action: 0.000000. Prob Before 0.013436. Prob After 0.012180.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.012180. Prob After 0.011061.\n",
	"Total reward 0.000000. Action: 0.000000. Prob Before 0.011061. Prob After 0.010130.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.010130. Prob After 0.009267.\n",
	"Total reward -4.000000. Action: 0.000000. Prob Before 0.009267. Prob After 0.008642.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.008642. Prob After 0.008035.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.008035. Prob After 0.007482.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.007482. Prob After 0.006977.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.006977. Prob After 0.006498.\n",
	"Total reward 6.000000. Action: 0.000000. Prob Before 0.006498. Prob After 0.006010.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.006010. Prob After 0.005570.\n",
	"Total reward 8.000000. Action: 0.000000. Prob Before 0.005570. Prob After 0.005113.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.005113. Prob After 0.004696.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.004696. Prob After 0.004325.\n",
	"Total reward 2.000000. Action: 0.000000. Prob Before 0.004325. Prob After 0.003994.\n",
	"Total reward -4.000000. Action: 0.000000. Prob Before 0.003994. Prob After 0.003756.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.003756. Prob After 0.003518.\n",
	"Total reward 4.000000. Action: 0.000000. Prob Before 0.003518. Prob After 0.003284.\n",
	"Total reward 0.000000. Action: 0.000000. Prob Before 0.003284. Prob After 0.003084.\n",
	"Total reward 0.000000. Action: 0.000000. Prob Before 0.003084. Prob After 0.002913.\n",
	"Total reward 0.000000. Action: 0.000000. Prob Before 0.002913. Prob After 0.002766.\n"
	]
	}
	],
	"source": [
	"xs,drs,ys = [],[],[]\n",
	"total_episodes = 500\n",
	"running_reward = None\n",
	"reward_sum = 0\n",
	"episode_number = 1\n",
	"\n",
	"init = tf.initialize_all_variables()\n",
	"\n",
	"# Launch the graph\n",
	"with tf.Session() as sess:\n",
	" sess.run(init)\n",
	"\n",
	" while episode_number <= total_episodes:\n",
	" #Generate a placeholder state\n",
	" x = np.ones([1,1])\n",
	" xs.append(x) \n",
	"\n",
	" # forward the policy network and sample an action from the returned probability\n",
	" prob = sess.run(probability,feed_dict={input_x:np.ones([1,1])})\n",
	" action = 1 if np.random.uniform() < prob else 0\n",
	"\n",
	" y = 1 if action == 0 else 0 # a \"fake label\"\n",
	" ys.append(y)\n",
	" \n",
	" # Take our action in the environment, and get a reward\n",
	" reward = np.float64(pullBandit(bandits[action]))\n",
	" reward_sum += reward\n",
	" drs.append(reward) # record reward \n",
	"\n",
	" \n",
	" if episode_number % 10 == 0: # Periodically update the network policy\n",
	" epx = np.vstack(xs)\n",
	" epy = np.vstack(ys)\n",
	" epr = np.vstack(drs)\n",
	" xs,drs,ys = [],[],[] # reset array memory\n",
	"\n",
	" # Update the network gradient towards choosing more ideal actions, given what it has observed.\n",
	" probBefore = sess.run(probability, feed_dict={input_x: epx, input_y: epy, rewardSig: epr})\n",
	" sess.run(updateGrads,feed_dict={input_x: epx, input_y: epy, rewardSig: epr})\n",
	" probAfter = sess.run(probability, feed_dict={input_x: epx, input_y: epy, rewardSig: epr})\n",
	"\n",
	" # Keep a record of rewards, and give some feedback\n",
	" running_reward = reward_sum if running_reward is None else running_reward * 0.99 + reward_sum * 0.01\n",
	" print 'Total reward %f. Action: %f. Prob Before %f. Prob After %f.' % (reward_sum, action,probBefore[0],probAfter[0])\n",
	" reward_sum = 0\n",
	" prev_x = None\n",
	" \n",
	" episode_number += 1"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"collapsed": true
	},
	"source": [
	"You should see that the agent learns to almost always choose action 0, and the probability of choosing action 1 decreases to near zero. Feel free to play with the two bandit values, and see how the agent changes what it learns."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 2",
	"language": "python",
	"name": "python2"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.11"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}