Skip to content

Instantly share code, notes, and snippets.

@hazardclassroom
Last active November 12, 2018 13:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hazardclassroom/1e950daedf599a37e11fa260b671c158 to your computer and use it in GitHub Desktop.
Save hazardclassroom/1e950daedf599a37e11fa260b671c158 to your computer and use it in GitHub Desktop.
An introduction course to the programming language Python
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"# Introduction to scatter plots\n",
"\n",
"This lesson will introduce scatter plots and how to make them in several different ways.\n",
"\n",
"We will start buy importing the ANSS earthquake catalog which you can download [here](http://www.quake.geo.berkeley.edu/anss/catalog-search.html)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"df = pd.read_csv('data/anss.csv', delim_whitespace=True)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
" Date Time Lat Lon Depth Mag Magt Nst Gap Clo \\\n",
"0 2002/01/01 10:39:06.82 -55.214 -129.000 10.0 6.0 Mw 78 1.07 NEI \n",
"1 2002/01/01 11:29:22.73 6.303 125.650 138.1 6.3 Mw 236 0.90 NEI \n",
"2 2002/01/01 19:53:06.95 -27.875 73.883 10.0 5.6 Mw 27 0.88 NEI \n",
"3 2002/01/01 21:25:20.02 -27.874 74.007 10.0 5.1 Mw 21 0.34 NEI \n",
"4 2002/01/01 23:13:45.20 -27.945 74.156 10.0 5.1 Mw 21 0.85 NEI \n",
"\n",
" RMS SRC Event ID \n",
"0 2.002010e+11 NaN NaN NaN \n",
"1 2.002010e+11 NaN NaN NaN \n",
"2 2.002010e+11 NaN NaN NaN \n",
"3 2.002010e+11 NaN NaN NaN \n",
"4 2.002010e+11 NaN NaN NaN "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"What are these columns?\n",
"\n",
"Some are self explanatory but some are not:\n",
"\n",
"* Magt - the way the magnitude was calculated\n",
"* Nst - the number of stations that detected the earthquake\n",
"* RMS - root mean square\n",
"*\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
" Lat Lon Depth Mag Nst \\\n",
"count 9998.000000 9998.000000 9998.000000 9998.00000 9998.000000 \n",
"mean 0.506756 45.405202 57.593392 5.37223 164.561112 \n",
"std 28.593770 119.249236 105.808366 0.41369 136.695802 \n",
"min -65.712000 -179.994000 -0.530000 5.00000 0.000000 \n",
"25% -18.294750 -69.652500 10.000000 5.10000 64.000000 \n",
"50% -2.424500 96.392500 30.000000 5.20000 121.000000 \n",
"75% 18.309500 142.318750 45.300000 5.50000 224.000000 \n",
"max 86.276000 179.998000 691.600000 9.00000 929.000000 \n",
"\n",
" RMS Event ID \n",
"count 9.947000e+03 1.800000e+01 0.0 \n",
"mean 1.999149e+11 1.652515e+07 NaN \n",
"std 1.062330e+10 1.697042e+07 NaN \n",
"min 4.000000e-02 1.129580e+05 NaN \n",
"25% 2.003112e+11 5.434046e+06 NaN \n",
"50% 2.005033e+11 5.793165e+06 NaN \n",
"75% 2.006091e+11 2.149626e+07 NaN \n",
"max 2.007103e+11 5.118347e+07 NaN "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Scatter plots are used to look at how two variables compare. For example, does the Magnitude of the earthquake correlate in some way to the number of stations that detected it?"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.figure.Figure at 0x7f8b663a60b8>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f8b64786128>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.plot(kind='scatter', x='Mag', y='Nst')"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"We can clean this plot up a lot though. Lets start by looking at the machinery that creates it."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.figure.Figure at 0x7f8b6479fe10>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"<matplotlib.text.Text at 0x7f8b643d2d30>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fig, ax = plt.subplots(1, figsize=(8,5))\n",
"\n",
"df.plot(ax=ax, kind='scatter', x='Mag', y='Nst', alpha=0.15, edgecolor='None'\n",
" ,color='violet')\n",
"\n",
"ax.set_ylim(0, 1000)\n",
"ax.set_xlim(4.5, 10)\n",
"ax.grid(True)\n",
"ax.set_title('Does the number of stations change for the magnitude?')\n",
"\n",
"ax.set_ylabel('Number of Stations')\n",
"ax.set_xlabel('Magnitude')"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"# Gutenberg-Richter plots"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"In seismology, the Gutenberg–Richter law expresses the relationship between the magnitude and total number of earthquakes (that is the distribution) in any given region and time period of at least that magnitude.\n",
"\n",
"$log_{10}N = a -bM$\n",
"\n",
"or\n",
"\n",
"$N = 10^{a-bM}$\n",
"\n",
"Where:\n",
"\n",
"* $N$ is the number of events having a magnitude, $\\geq M$\n",
"* $a$ and $b$ are constants.\n",
"\n",
"It is assumed to be poissonian.\n",
"\n",
"To create the Gutenburg-Richter distribution of the earthquake magnitudes from the ANSS catalog, we are going to use numpy."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 6. , 6.3, 5.6, ..., 5.5, 5. , 5.5])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.Mag.values.round(1)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"hist, edges = np.histogram(a=df.Mag.values.round(1), bins=101, range=(0,10))\n",
"chist = np.cumsum(hist[::-1])[::-1]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0. , 0.0990099 , 0.1980198 , 0.2970297 ,\n 0.3960396 , 0.4950495 , 0.59405941, 0.69306931,\n 0.79207921, 0.89108911, 0.99009901, 1.08910891,\n 1.18811881, 1.28712871, 1.38613861, 1.48514851,\n 1.58415842, 1.68316832, 1.78217822, 1.88118812,\n 1.98019802, 2.07920792, 2.17821782, 2.27722772,\n 2.37623762, 2.47524752, 2.57425743, 2.67326733,\n 2.77227723, 2.87128713, 2.97029703, 3.06930693,\n 3.16831683, 3.26732673, 3.36633663, 3.46534653,\n 3.56435644, 3.66336634, 3.76237624, 3.86138614,\n 3.96039604, 4.05940594, 4.15841584, 4.25742574,\n 4.35643564, 4.45544554, 4.55445545, 4.65346535,\n 4.75247525, 4.85148515, 4.95049505, 5.04950495,\n 5.14851485, 5.24752475, 5.34653465, 5.44554455,\n 5.54455446, 5.64356436, 5.74257426, 5.84158416,\n 5.94059406, 6.03960396, 6.13861386, 6.23762376,\n 6.33663366, 6.43564356, 6.53465347, 6.63366337,\n 6.73267327, 6.83168317, 6.93069307, 7.02970297,\n 7.12871287, 7.22772277, 7.32673267, 7.42574257,\n 7.52475248, 7.62376238, 7.72277228, 7.82178218,\n 7.92079208, 8.01980198, 8.11881188, 8.21782178,\n 8.31683168, 8.41584158, 8.51485149, 8.61386139,\n 8.71287129, 8.81188119, 8.91089109, 9.00990099,\n 9.10891089, 9.20792079, 9.30693069, 9.40594059,\n 9.5049505 , 9.6039604 , 9.7029703 , 9.8019802 ,\n 9.9009901 , 10. ])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"edges"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0, 0, 0, 0, 0, 2045, 1611, 1407, 1052, 843,\n",
" 666, 508, 405, 295, 250, 203, 169, 121, 93, 72, 51,\n",
" 47, 40, 25, 16, 10, 12, 11, 7, 7, 6, 9,\n",
" 3, 2, 2, 2, 3, 0, 2, 0, 1, 1, 0,\n",
" 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,\n",
" 0, 0])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hist"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.figure.Figure at 0x7f8b64332a90>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"<matplotlib.text.Text at 0x7f8b64472fd0>"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fig, ax = plt.subplots()\n",
"\n",
"ax.plot(edges[:-1], hist, marker='s', color='None', linestyle='')\n",
"ax.plot(edges[:-1], chist, marker='^', color='red', linestyle='')\n",
"ax.set_yscale('log')\n",
"ax.set_ylabel('N')\n",
"ax.set_xlabel('Magnitude')\n",
"# ax.set_xlim(4.5, 10)\n",
"ax.set_title('Gutenburg-Richter Distribution')"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"We quite frequently want to plot the fit of the a and b values of the gutenburg richter distribution. To do this we are going to use the equation below which calculates the maximum likelihood of the distribution and returns the parameters:\n",
"\n",
"* a - the \"productivity\" of the distribution, but really just the y-intercept\n",
"* b - the \"mean magnitude\" but really it's just the slope\n",
"* bstdev - the standard deviation of b\n",
"* length - the number of earthquakes used to calculate the values"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"def fmd_values(magnitudes, bin_width=0.1):\n",
" \"\"\"\n",
" params magnitudes : numpy.array\n",
" params bin_width : float\n",
"\n",
" returns a,b,bstd, n-values if above the earthquake count threshold\n",
" else returns np.nans\n",
" \"\"\"\n",
" length = magnitudes.shape[0]\n",
" minimum = magnitudes.min()\n",
" average = magnitudes.mean()\n",
" b_value = (1 / (average - (minimum - (bin_width/2)))) * np.log10(np.exp(1))\n",
" square_every_value = np.vectorize(lambda x: x**2)\n",
" b_stddev = square_every_value((magnitudes - average).sum()) / (length * (length - 1))\n",
" b_stddev = 2.3 * np.sqrt(b_stddev) * b_value**2\n",
" a_value = np.log10(length) + b_value * minimum\n",
"\n",
" return a_value, b_value, b_stddev, length"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(9.1427739318422674, 1.0285721598851394, 1.5357096869688311e-15, 9998)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fmd_values(df.Mag.values)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"a, b, bstd, n = fmd_values(df.Mag.values)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"x = np.linspace(0,10, 1000)\n",
"y = 10**(a - b*x)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.text.Text at 0x7f6f22b97a90>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"<matplotlib.figure.Figure at 0x7f6f22c9d5f8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig, ax = plt.subplots()\n",
"\n",
"ax.plot(edges[:-1], hist, marker='s', color='None', linestyle='')\n",
"ax.plot(edges[:-1], chist, marker='^', color='red', linestyle='')\n",
"\n",
"ax.plot(x,y)\n",
"\n",
"ax.set_yscale('log')\n",
"ax.set_ylabel('N')\n",
"ax.set_xlabel('Magnitude')\n",
"ax.set_xlim(4.5, 10)\n",
"ax.set_ylim(1e0, 1e5)\n",
"ax.set_title('Gutenburg-Richter Distribution')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"autoscroll": false,
"collapsed": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
},
"name": "introduction to scatter plots and histograms.ipynb"
},
"nbformat": 4,
"nbformat_minor": 1
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment