Skip to content

Instantly share code, notes, and snippets.

@josh-hernandez-exe
Last active September 19, 2020 22:28
Show Gist options
  • Save josh-hernandez-exe/c331d6cb5cfb5127df6db126fc555c47 to your computer and use it in GitHub Desktop.
Save josh-hernandez-exe/c331d6cb5cfb5127df6db126fc555c47 to your computer and use it in GitHub Desktop.
625.603 Discussion Assignment 2
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"I want to explore question 3.3.2 through the lens of symbolic computation. I will use python language and scientific python stack with a focus on the sympy library.\n",
"\n",
"## Motivation\n",
"\n",
"The reason I want to do this is because I think it's very valuable for anyone who is doing data science or applied math work to be able to write code to assist you in your work. There are obvious reasons, like analyzing tons of data that wouldn't fit on an excel sheet. But there are others like brute force checks (for small problems), and symbolic computation. Having a computer double check your algebra can be quite valuable, and should probably be used in practice.\n",
"\n",
"## Software\n",
"\n",
"So not all programs you use for math/statistics are equal. Some of these are for numerical computations. Examples are MATLAB and RStudio. Within [scientific python ecosystem](https://www.scipy.org/about.html), you would be using scipy or pandas do similar tasks. Here is a [list of numerical computing software](https://en.wikipedia.org/wiki/List_of_numerical-analysis_software#Numerical-software-packages).\n",
"\n",
"A computer algebra system (CAS), which as the name implies, is basically a calculator for algebra. Another name that I will also refer to it as a symbolic calculator. Examples of this are Maple, Mathematica, and SageMath. Within the scientific python ecosystem, you would be using sympy. Here is a [list of computer algebra systems](https://en.wikipedia.org/wiki/List_of_computer_algebra_systems#Functionality) that compares\n",
"\n",
"There are many that try to do both. As a tip, through JHU you can access software for [MATLAB](https://johnshopkins.service-now.com/serviceportal?id=sc_cat_item&sys_id=9c250c946f92a900053d02dc5d3ee48b) and [Mathematica](https://johnshopkins.service-now.com/serviceportal?id=sc_cat_item&sys_id=bdda0a870f0bd100f9c8eee692050efc) through my.jhu.edu.\n",
"\n",
"To reiterate I use python, sympy and Jupyter (formally IPython notebook) to create this report. \n",
"\n",
"## Disclaimer\n",
"\n",
"I've used many of the above software before and some extensively as an undergrad. Over time I have gained the opinion that unless I need a specific feature from the specialized software, I'd rather use something in python. This is likely a bias as after my undergrad I ended up in the software development sector. \n",
"\n",
"## Caveats\n",
"\n",
"Since this is done using python, there are times when the language gets in the way of symbolic calculator. Many of my work projects have me use python, so it doesn't bother me very much. If you find that daunting then Mathematica will be great start for the purposes of symbolic computing.\n",
"\n",
"## Sales Pitch\n",
"\n",
"I am advocating learning python and using scientific python stack. As a truly general purpose programming language, it's a skill that can transfer to many areas.\n",
"\n",
"According to the 2020 stackoverflow survey (keep in mind that this is self reported) it's the [third most popular scripting language](https://insights.stackoverflow.com/survey/2020#technology-programming-scripting-and-markup-languages) and it's the [third morst loved language](https://insights.stackoverflow.com/survey/2020#technology-most-loved-dreaded-and-wanted-languages). So there is a very large community as a whole. As well as having relevance in the scientific community."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"The next cell is a bit of boilerplat to setup the environment.\n",
"\n",
"If you want to try sympy for yourself, there is an free online version you can use [here](https://live.sympy.org/).\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Some core python modules that will be useful\n",
"import itertools\n",
"from collections import Counter\n",
"\n",
"# the * unload everything in sympy into global scope\n",
"# mostly for convenience\n",
"from sympy import *"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next block is an odd block of code. Since we are using more traditional programming language, variables hold whatever. We want create a python variable that we will treat like a true mathematical variable. This is probably the most esoteric line of code in this report.\n",
"\n",
"Furthermore when declaring this symbolic variable, we can declare [assumptions](https://docs.sympy.org/latest/modules/core.html#module-sympy.core.assumptions) about it."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# A sympy declaration for a symbolic variable\n",
"# with the assumption that it is positive\n",
"b = symbols('b', positive=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Question 3.3.2\n",
"\n",
"Finally to the actual question at hand!🎉\n",
"\n",
"We have an urn that contains five balls numbered 1 to 5 and two balls are drawn with replacement. We want to calculate two probability density functions $p_{X}(k)$ and $p_{V}(k)$ where the random variable $X$ is the max of the two balls and $V$ is the sum of the two balls.\n",
"\n",
"### Slight Generalization\n",
"\n",
"To demonstrate the symbolic computation aspects, I'm going to generalize the problem a bit so that the numbered balls are from $1 \\cdot b$ to $5 \\cdot b$, where $b$ is an known positive parameter.\n",
"\n",
"I need to impose that it's positive, otherwise the random variable $X$ isn't well defined."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So lets define our python `list` of ball values as follows"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"balls = [b, 2*b, 3*b, 4*b, 5*b]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now lets display the contents."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[b, 2*b, 3*b, 4*b, 5*b]\n"
]
}
],
"source": [
"print(balls)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For convenience we create two python variables. The description of the variables is in the comments."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"num_balls = len(balls) # the number of items in the list of ball values\n",
"num_draw = 2 # the number of ball to be drawn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following block of code uses a builtin python function to take the [direct product](https://docs.python.org/3.8/library/itertools.html#itertools.product) of the list of balls values. During the iteration of the products, the `selection` variable is a python `tuple` of two items which is the ordered the selection (with replacement). Within the for-loop we track the number of occurrences of of the max and sum of the two items (using instances of [`Counter`](https://docs.python.org/3.8/library/collections.html#collections.Counter))."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"X_counter = Counter()\n",
"V_counter = Counter()\n",
"for selection in itertools.product(balls, repeat=num_draw): \n",
" X_counter[max(selection)] += 1\n",
" V_counter[sum(selection)] += 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now display the values"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Counter({5*b: 9, 4*b: 7, 3*b: 5, 2*b: 3, b: 1})\n"
]
}
],
"source": [
"print(X_counter)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Counter({6*b: 5, 5*b: 4, 7*b: 4, 4*b: 3, 8*b: 3, 3*b: 2, 9*b: 2, 2*b: 1, 10*b: 1})\n"
]
}
],
"source": [
"print(V_counter)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An odd thing is that I use `S.One` randomly. This is because two `int` values divided in python version 3 or greater will resulting in a flowing point value. Multiplying by the sympy `Integer` `S.One` will create a new sympy object that will do it's best to keep it's truest mathematical representation. We thus get `Rational` values as a result.\n",
"\n",
"The next block also has syntax for a [formated string literal](https://docs.python.org/3/reference/lexical_analysis.html#f-strings) and [formatting syntax](https://docs.python.org/3.8/library/string.html#format-string-syntax). Don't worry if you don't get what's going on in the `print` statement. It's just short hand for inserting variables into strings and formatting text."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"p_X( b) = 1/25\n",
"p_X(2*b) = 3/25\n",
"p_X(3*b) = 1/5\n",
"p_X(4*b) = 7/25\n",
"p_X(5*b) = 9/25\n",
"\n",
"p_V( 2*b) = 1/25\n",
"p_V( 3*b) = 2/25\n",
"p_V( 4*b) = 3/25\n",
"p_V( 5*b) = 4/25\n",
"p_V( 6*b) = 1/5\n",
"p_V( 7*b) = 4/25\n",
"p_V( 8*b) = 3/25\n",
"p_V( 9*b) = 2/25\n",
"p_V(10*b) = 1/25\n"
]
}
],
"source": [
"x_total = sum(X_counter.values())\n",
"p_x = dict()\n",
"for x,count in X_counter.items():\n",
" p_x[x] = S.One*count/x_total\n",
" print(f\"p_X({str(x):>3s}) = {p_x[x]}\")\n",
"\n",
"print()\n",
" \n",
"v_total = sum(V_counter.values())\n",
"p_v = dict()\n",
"for v,count in V_counter.items():\n",
" p_v[v] = S.One*count/v_total\n",
" print(f\"p_V({str(v):>4s}) = {p_v[v]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Summarizing the data in a table:\n",
"\n",
"| $X$ | count | $p_{X}(k)$\n",
"|----|------|-----|\n",
"|$b$ | 1 | $\\frac{1}{25}$ |\n",
"|$2b$| 3 | $\\frac{3}{25}$ |\n",
"|$3b$| 5 | $\\frac{1}{5}$ |\n",
"|$4b$| 7 | $\\frac{7}{25}$ |\n",
"|$5b$| 9 | $\\frac{9}{25}$ \n",
"\n",
"|$V$ | count | $p_{V}(k)$ |\n",
"|----|-------|---|\n",
"|$2b$| 1 | $\\frac{1}{25}$ |\n",
"|$3b$| 2 | $\\frac{2}{25}$ |\n",
"|$4b$| 3 | $\\frac{3}{25}$ |\n",
"|$5b$| 4 | $\\frac{4}{25}$ |\n",
"|$6b$| 5 | $\\frac{1}{5}$ |\n",
"|$7b$| 4 | $\\frac{4}{25}$ |\n",
"|$8b$| 3 | $\\frac{3}{25}$ |\n",
"|$9b$| 2 | $\\frac{2}{25}$ |\n",
"|$10b$| 1 | $\\frac{1}{25}$ |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Bonus\n",
"\n",
"So we have fully described the probability density function of our random variables $X$ and $V$. Although you have have to substitute $b$ for $1$.\n",
"\n",
"To further demonstrate the abilities of symbolic computation, we compute the expected value and standard deviation of $X$ using Definition 3.5.1 and Definition 3.6.1 from the text book."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"expect_x = sum(x * p for x,p in p_x.items())\n",
"var_x = sum( (x - expect_x)**2 * p for x,p in p_x.items() )\n",
"std_dev_x = var_x**(S.One/2) # or sqrt(var_x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now lets display these values."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle \\frac{19 b}{5}$"
],
"text/plain": [
"19*b/5"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"expect_x"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle \\frac{\\sqrt{34} b}{5}$"
],
"text/plain": [
"sqrt(34)*b/5"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"std_dev_x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, the algebra is done for you. Which can help reduce error during more complicated calculations. Though we should still show your work on the assignments you hand in 🤪.\n",
"\n",
"Just as easily we can plug in values. With sympy I demonstrate two ways. One with `subs` that maintains mathematical expression as much as possible."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle \\frac{19}{5}$"
],
"text/plain": [
"19/5"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"expect_x.subs(b, 1)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle \\frac{\\sqrt{34}}{5}$"
],
"text/plain": [
"sqrt(34)/5"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"std_dev_x.subs(b, 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The other method is with `evalf`, which evaluates the expression at a floating point value."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle 3.8$"
],
"text/plain": [
"3.80000000000000"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"expect_x.evalf(subs={b:1})"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle 1.16619037896906$"
],
"text/plain": [
"1.16619037896906"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"std_dev_x.evalf(subs={b:1})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With all this work to create symbolic representations for all this is that we can now `subs` in multiple values for `b` and see what it should be."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"\n",
"Symbolic computation can assist tedious tasks which leaves more the user free to think about the result. This is not a replacement for understanding the algebraic manipulation. But this is the similar level of augmentation to ones work flow that excel (or other numerical computation software) provides to someone. \n",
"\n",
"## Appendix\n",
"\n",
"Another example of myself using the same software stack but for a [problem of personal interest](https://gist.github.com/josh-hernandez-exe/07abd3cf45090fd662b38be7ced647ea). There I find the number expected uses of a move in a video game that has limited uses, but with some random probability of refreshing uses. I take advantange of symbolic computation by adjusting some parameters several times.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@sciencecasey
Copy link

This is awesome, Josh! It definitely gives me a starting place for python that feels relevant. Thank you for sharing it! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment