Skip to content

Instantly share code, notes, and snippets.

@sabineri
Created February 17, 2020 16:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sabineri/b47d9410d45d564c91b67748067aba28 to your computer and use it in GitHub Desktop.
Save sabineri/b47d9410d45d564c91b67748067aba28 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"<a href=\"https://www.bigdatauniversity.com\"><img src = \"https://ibm.box.com/shared/static/wbqvbi6o6ip0vz55ua5gp17g4f1k7ve9.png\" width = 300, align = \"center\"></a>\n",
"\n",
"<h1 align=center><font size = 5>BOX PLOTS</font></h1>\n",
"\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Contents\n",
"\n",
"\n",
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
"<li><p><a href=\"#ref0\">Introduction</a></p></li>\n",
"<li><p><a href=\"#ref1\">Box Plot</a></p></li>\n",
"<br>\n",
"<p></p>\n",
"Estimated Time Needed: <strong>15 min</strong>\n",
"</div>\n",
"\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref0\"></a>\n",
"## <center>Introduction</center>\n",
"\n",
"In this notebook, we are going to explore how to create box plots in R. Box plots are a convenient way to represent the degree of dispersion (spread) and skewness in the data, and show outliers without making any assumption of the underlying statistical distribution.Let us first install the package plotly."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Installing package into ‘/resources/common/R/Library’\n",
"(as ‘lib’ is unspecified)\n",
"also installing the dependencies ‘htmltools’, ‘jsonlite’, ‘yaml’, ‘viridisLite’, ‘htmlwidgets’, ‘hexbin’, ‘purrr’\n",
"\n",
"Loading required package: ggplot2\n",
"\n",
"Attaching package: ‘plotly’\n",
"\n",
"The following object is masked from ‘package:ggplot2’:\n",
"\n",
" last_plot\n",
"\n",
"The following object is masked from ‘package:stats’:\n",
"\n",
" filter\n",
"\n",
"The following object is masked from ‘package:graphics’:\n",
"\n",
" layout\n",
"\n",
"The following objects are masked from ‘package:SparkR’:\n",
"\n",
" arrange, distinct, filter, group_by, mutate, rename, schema, select\n",
"\n"
]
}
],
"source": [
"install.packages(\"plotly\")\n",
"library(plotly)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref1\"></a>\n",
"## <center>Box Plot</center>\n",
"\n",
"First, we generate data to represent with a box plot. We will use two normal distributions with slightly different parameters to generate our samples so you can see the effect it has on the plot. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
" label value\n",
"1 A -1.414131\n",
"2 A 1.554858\n",
"3 A 3.168882\n",
"4 A -3.691395\n",
"5 A 1.858249\n",
"6 A 2.012112"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
" label value\n",
"395 B 0.52874502\n",
"396 B 0.78939440\n",
"397 B 0.45709951\n",
"398 B 0.53883312\n",
"399 B 0.01464312\n",
"400 B -0.91648914"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#making the results reproducible\n",
"set.seed(1234)\n",
"\n",
"set_a <- rnorm(200, mean=1, sd=2)\n",
"set_b <- rnorm(200, mean=0, sd=1)\n",
"\n",
"#create the data frame\n",
"df <- data.frame(label = factor(rep(c(\"A\",\"B\"), each=200)), value = c(set_a, set_b))\n",
"\n",
"#output both the first and last rows\n",
"head(df)\n",
"tail(df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, we have randomly generated two sets, labelling them respectively A and B.\n",
"\n",
"### geom_bloxplot()\n",
"\n",
"We create a plot with the ggplot function and specify the x and y-axis of the plot. Then we add the boxplot to our plot, which results in creating one boxplot for each value of x."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": []
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAHgCAMAAABKCk6nAAACNFBMVEUAAAACAgIHBwcSEhITExMUFBQVFRUWFhYYGBgZGRkeHh4jIyMlJSUmJiYnJycoKCgpKSksLCwtLS0uLi4vLy8yMjIzMzM0NDQ1NTU7Ozs8PDw9PT0+Pj4/Pz9BQUFDQ0NFRUVISEhJSUlKSkpLS0tMTExNTU1OTk5PT09QUFBSUlJTU1NVVVVWVlZXV1dZWVlaWlpbW1tcXFxfX19gYGBhYWFjY2NkZGRlZWVnZ2doaGhpaWlqampra2tsbGxtbW1vb29wcHBxcXFycnJzc3N0dHR1dXV2dnZ3d3d4eHh5eXl6enp7e3t8fHx9fX1+fn5/f3+AgICBgYGDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJibm5ufn5+goKChoaGjo6OlpaWpqamrq6usrKytra2urq6vr6+xsbGysrKzs7O1tbW2tra4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AwMDCwsLDw8PExMTFxcXGxsbHx8fJycnKysrLy8vMzMzNzc3Ozs7Pz8/Q0NDR0dHT09PU1NTV1dXW1tbX19fY2NjZ2dna2trb29vc3Nzd3d3e3t7g4ODh4eHi4uLj4+Pk5OTl5eXm5ubn5+fo6Ojp6enq6urr6+vs7Ozt7e3u7u7v7+/w8PDx8fHy8vLz8/P09PT19fX29vb39/f4+Pj6+vr7+/v8/Pz+/v7///+HwgFwAAAN9ElEQVR4nO3c+3sU5RXAcUvpzWprrawaRGqltRYEqq1KoIoNSty4gFVB2wgIBS9gvZS0iCJeWO8tlYoiUkWKBnODXFZw33+us0F89IEzkz0znPNm8/38sDvP/pAzeb/Mu5s8Yc4LaGnneZ8Azi0CtzgCtzgCt7gmAo+OOKjVPKYW7ITL0o00G7jvMwejYx5TC1bvdxlLYCsRBt5fLi+9o3FQm18u7yFwPhEGTjy1bTxwmSs4tzgDL+0fD7ywc90wgfOJMvChu8af6sOhZ3Py3FkqDYn/FhCj0fFHMfDjz54+GlqWPIwdO9bnYWzMZWyx6gMuY9MDL0l+jKoPhlo9VLtPvcIWrRXjFr3/3uRhpD3s7eiq9BI4nxgDnwWBtQgsI7Aega0QWEZgPQJbIbCMwHoEtkJgGYH1JkPgt1aufMthbMEILPmwlPiv/dyCEViyqxH4Ofu5BSOwZF8j8H/s5xaMwKInS6UnHMYWjMAyPkXrEdgKgWUE1iOwFQLLCKxHYCsElhFYj8BWCCwjsB6BrRBYRmA9AlshsIzAegS2QmAZgfUIbIXAMgLrEdgKgWUE1iOwFQLLCKxHYCsElhFYj8BWCCwjsF6zgfs9bubFjdD0mg3sclPrEydcxhbL517qQ80GZovWmiRbNIG1CCzavmTJPxzGFozAkvca/z/4Xfu5BSOwhP/hn8ckCHywEfig/dyCEVhU7fjDboexBSOwjE/RegS2QmAZgfUIbIXAMgLrEdgKgWWvVD2mFozAsvXdHlMLRmAZgfUIbIXAMgLrEdgKgWUE1iOwFQLLCKxHYCsElhFYj8BWCCwjsB6BrRBYRmA9AlshsIzAegS2QmAZgfUIbIXAMgLrEdgKgWUE1iOwFQLLCKyXFrg2v1zeM37Us3LFEQLnE2Pg8unL9o76+6sJnE+MgRd2rhtuHFS3hrCIwPlEGLg+HHo2Nw52PBnCzcnz0/fdN1BzsHGtx9SChc89ph5LC5wYWvb1K/ilLVsGPO7Ht3Gtx9SCRXgrw1o9VLtDfbDxHrx/DVt0PhFu0Xs7uiq9YaQ9+RRdqfApOqcIA58NgbUILCOwHoGtEFhGYD0CWyGwjMB6BLZCYBmB9QhshcAyAusR2AqBZQTWI7AVAssIrEdgKwSWEViPwFYILCOwHoGtEFhGYD0CWyGwjMB6BLZCYBmB9QhshcAyAusR2AqBZQTWI7AVAssIrEdgKwSWEViPwFYILCOwHoGtEFhGYD0CWyGwjMB6BLYySQKPjjnYuNZjasFCzWNq1q0MuYILM0muYAJrEVhGYD0CWyGwjMB6BLZCYBmB9QhshcAyAusR2AqBZQTWI7AVAssIrEdgKwSWEViPwFYILCOwHoGtEFhGYD0CWyGwjMB6BLZCYBmB9QhshcAyAusR2MjH733kMpfANt4olUrbPQYT2MadSeBFHoMJbGN5Enixx2AC29idBH7KYzCBjex7/m2XuQS2wo9JMgLrEdgKgWUE1iOwFQLLCKxHYCsElhFYLy3wB51dXb2Ng9r8cnkPgfOJMPBQLezeNB64zBWcW4SBE9VHxgMv7Fw3nDwf3revb9DBQw96TC1Y/ZjH1P70wCNLDzee6sOhZ3Py/MA11wx84WDTeo+pBQsuU0dSA5+ovPHVdr2MLTqfCLfo+gM7G4+DoVYP1W4C5xNh4Op1lcrWMNIe9nZ0VXoJnE+Egc+GwFoElhFYj8BWCCwjsB6Brfx8v8tYAltpe89lLIGtEFhGYD0CW2m9wO+231SQa68t6isteufcLWWG1gv84tU7o/PLneduKTO0YOB5E/+yVhYQOAOBtQhsgcBZCKxFYAsEzkJgLQJbIHAWAmsR2AKBsxBYi8AWCJyFwFoEtkDgLATWIrCFqR748wNZK0RgrRgC7zz/gvDq/NQVIrBWDIEv/XhWCD9OXSECa8UQeEZIAl+UukIE1oohcNuns8L2ttQVIrBWDIF3/+S7M6e/lrpCBNaKIXDo69n2WfoKEVgrisDZCKwVQ+Bp41JXiMBaMQQ+efLk6BOV1BUisFYMgU+tQeoKEVgrlsD/uyx1hQisFUPg6dOnf/sHT6WuUBN30Xs5xsAvnLObBmZp+8Bj6jdvZZi8MJyxQscnrhpj4Jea+AaK1XbQY+rA+Ld9KnDtS6krxBat5b9FT5vGj0nnkH/gCSGwFoEtTPHAA+U5s2fPTl0hAmvFEPja1Zf0LLg9dYUIrBVD4ItDWwi/SV0hAmvFELgUZg6Eq1JXiMBaMQRe0PfwhbO4gs+NGAIn3nzui9QVIrBWDIFHsleIwFoxBJ7+u5ezVojAWjEE7tsw44f3fJS6QgTWiiFwYu8N30pdIQJrRRH4ZM+vv3Nj6goRWCuGwLd9f/bjo+krRGCtGALfcyhzhQisFUPgCZhqgXtWFKVteVFfaUssgWcvjs7Pmg182+1bYnPf3FgCzyxF54qmAz/dzAZh4p/RBJ7zdHSuIXCGqfYeTGACWyNwCgJnIbA7AqcgcBYCuyNwCgJnIbA7AqcgcBYCuyNwCgJnIbA7AqcgcBYCuyNwCgJnmWqBl8z1/iOUM1z/CwKLmg7c7v03KGdxdWGBe1auOPLNgykX+NaK9x+hnGHdr4oK3HdH/f3V3ziYeoFb+j24ujWERd84IHAEigu848kQbv76wYbrrx88OWFvxhj4tYmf/7jlEQae18T5n7pz4USv4AOvv94/NGG7Ywz84sTPf1xHhIHnNnH+X7+V4RmSt979a0J98MuDU6+xRXsr8OfgnkrlSBhp//JgHIHd8YuOFATOQmB3BE5B4CwEdkfgFATOQmB3BE5B4CwEdkfgFATOQmB3BE5B4CwEdkfgFATOQmB38QSeeyw68wmcoYnAr1/q/felZ/EKgdM1Ebg467s9po67/fIrYzPzuma+AQKn+3BvUS5/uaivdLCZb4DAVlrvdsLFIbAega0QWEZgPQJbIbCMwHoEtkJgGYH1CGyFwDIC6xHYCoFlBNYjsBUCywisR2ArBJYRWI/AVggsI7Aega0QWEZgPQJbmSSBR8ccbFzrMbVgbR95TD3GFWxlklzBBNYisIzAegS2QmAZgfUIbIXAMgLrEdgKgWUE1iOwFQLLCKxHYCsElhFYj8BWCCwjsB6BrRBYRmA9AlshsIzAegS2QmAZgfUIbIXAMgLrEdgKgWUE1iOwFQLLCKxHYCsElhFYj8BWCCwjsB6BrRBYRmA9AlshsIzAegS2QmBZSwSed8BlLIGt1PtdxhLYSoSBP+js6uptHNTml8t7CJxPhIGHamH3pvHAZa7g3CIMnKg+Mh54Yee64eT57Z07+4872PCgx9SC1Yc9pg6kBx5ZerjxVB8OPZuT50cXLx484eAv6zymFiyc9Jg6LAbevuqxcKLyxlfb9TK26Hwi3KLrD+xsPA6GWj1UuwmcT4SBq9dVKlvDSHvY29FV6SVwHofKpWX7PAanvweficA6fy6VSh0egwlsY2kSuOQxmMA2tiZ9V3sMJrCNo1vLm454DCawlQg/RRO4SASWEViPwFYILCOwHoGtEFhGYD0CWyGwjMB6BLZCYBmB9QhshcAyAusR2AqBZQTWI7CNA0tKi970GExgG38slUq3eAwmsI3bksCLPQYT2Ma2JPBGj8EENvLspr8d9ZhLYCPVrTtd5hLYxjPJFv2ox2AC2+hIArd7DCawjRVJ4N97DCawjXduKpWqHoMJbKXXZyyBrfC7aBmB9QhshcAyAusR2AqBZQTWI7AVAssIrNds4GEPG9a6jC1WfcRj6mCzgQc8PNTtMrZY9SGPqX3NBmaL1pokWzSBtQgsI7Aega0QWEZgPQJbIbCMwHoEtkJgGYH1CGyFwDIC6xHYCoFlBNYjsBUCywisR2ArBJYRWI/AVggsI7Aega0QWEZgPQJbIbCMwHqTIvD+/R5TC0Zg2eiYx9SCEVhGYD0CWyGwjMB6BLZCYNGBR7YcdBhbMAJLDt9YKt34sf3cghFY8mIp8bz93IIRWPLvRuB/2c8tGIFFG0ulhxzGFozAssEhj6kFI7CMH5P0CGyFwDIC66UFrs0vl/eMH/WsXHGEwPnEGLh8+rK9o/7+agLnE2PghZ3rhhsH1a0hLCJwPhEGrg+Hns2Ngx1PhnBz8nz3lVcO1B0kpzL5+XwPo2Lg7aseSx6Hln39Cu4/fLjP4358tZrH1IJFeCvDWj1Uu0N9sPEevH8NW3Q+EW7Rezu6Kr1hpD35FF2p8Ck6pwgDnw2BtQgsI7Aega0QWEZgPQJbIbCMwHqTIvAnRzymFuzQUZexzQZ2sWm99xkUoO1Tv9kENkBg2Y7t3mdQgBVDfrNjD4ycCNziYg/8p/u9zyC3IwtXLf+r2/TIA9cqd415n0NeRyoh3Hrca3rkgV/Y/swu73PIKwl88paa1/TIA997/Pg93ueQV7JFtz/iNj3uwMduWLXqt8e8zyKn5Aqur9nrNT3uwNufC2HXM95nkVPjPXjDbq/pcQfuPBrCwJ3eZ5FTskVX7j/hNT3uwMiNwC2OwC2OwC2OwC1uigae9tVvls4TXm8VBBZebxVTN/DC0k+vHUgWYOWMi7aF8MoVMy7bReCWkYQ8GsLqcrIA68IH3+vtv/iTcOj8zwncKpKQD89su2ROsgBDIczp6fnurFmzLjhA4FYxrfbmhQOhZ/aXgf++bebp111P6xyYsoGfnRXCgkbgtckWffSz6TtCeJXALWNa7YsFVy1Y1gh89yU/Sj5kvXr5RRfMIzAmHQK3OAK3OAK3OAK3OAK3uP8DuxB1q6nd3hIAAAAASUVORK5CYII="
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": []
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ggplot(df, aes(x=label, y=value)) + geom_boxplot()\n",
"\n",
"ggplotly()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"### Now onto a brief explanation of what a box plot is, for those who don't know\n",
"\n",
"As was briefly metioned in the Introduction section, box plots are good to indicate dispertion. They do so by providing visual representations in terms of [quartiles](https://en.wikipedia.org/wiki/Quartile).\n",
"\n",
"- The thickest line in the middle of the rectangle represents the median value (second quartile). \n",
"- The bottom and top of the rectangle represent the first and third quartiles, respectively. \n",
"- The height of the rectangle equals the [IQR (interquatile range)](https://en.wikipedia.org/wiki/Interquartile_range), which is the difference between the first and third quartiles.\n",
"- The superior line reaches up to the largest value that is not larger than $1.5$*IQR; Values that are larger are considered outliers and represented as dots. (the inferior line is analogous)."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### About the Author: \n",
"Hi! It's [Hugo Sales Correa](https://www.linkedin.com/in/hugo-sales-correa-143a22109), the authors of this notebook. We hope you found R easy to learn! There's lots more to learn about R but you're well on your way. Feel free to connect with us if you have any questions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n",
"Copyright &copy; 2016 [Big Data University](https://bigdatauniversity.com/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r"
},
"widgets": {
"state": {},
"version": "1.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment