Skip to content

Instantly share code, notes, and snippets.

@sabineri
Created February 13, 2020 15:55
Show Gist options
  • Save sabineri/89fd31e00c0c62d48f1367d2da66af2a to your computer and use it in GitHub Desktop.
Save sabineri/89fd31e00c0c62d48f1367d2da66af2a to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"http://cognitiveclass.ai/wp-content/uploads/2017/11/cc-logo-square.png\" width=\"150\">\n",
"\n",
"<h1 align=center>R BASICS</h1> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Welcome!\n",
"\n",
"By the end of this notebook, you will have learned the basics of R! "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table of Contents\n",
"\n",
"\n",
"<ul>\n",
"<li><a href=\"#About-the-Dataset\">About the Dataset</a></li>\n",
"<li><a href=\"#Simple-Math-in-R\">Simple Math in R</a></li>\n",
"<li><a href=\"#Variables-in-R\">Variables in R</a></li>\n",
"<li><a href=\"#Vectors-in-R\">Vectors in R</a></li>\n",
"<li><a href=\"#Strings-in-R\">Strings in R</a></li>\n",
"</ul>\n",
"<p></p>\n",
"Estimated Time Needed: <strong>15 min</strong>\n",
"\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref0\"></a>\n",
"<h2 align=center>About the Dataset</h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Which movie should you watch next? \n",
"\n",
"Let's say each of your friends tells you their favorite movies. You do some research on the movies and put it all into a table. Now you can begin exploring the dataset, and asking questions about the movies. For example, you can check if movies from some certain genres tend to get better ratings. You can check how the production cost for movies changes across years, and much more. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Movies dataset**\n",
"\n",
"The table gathered includes one row for each movie, with several columns for each movie characteristic:\n",
"\n",
"- **name** - Name of the movie\n",
"- **year** - Year the movie was released\n",
"- **length_min** - Length of the movie (minutes)\n",
"- **genre** - Genre of the movie\n",
"- **average_rating** - Average rating on [IMDB](http://www.imdb.com/)\n",
"- **cost_millions** - Movie's production cost (millions in USD)\n",
"- **foreign** - Is the movie foreign (1) or domestic (0)?\n",
"- **age_restriction** - Age restriction for the movie\n",
"<br>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src = \"https://ibm.box.com/shared/static/6kr8sg0n6pc40zd1xn6hjhtvy3k7cmeq.png\" width = 90% align=\"left\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### We can use R to help us explore the dataset\n",
"But to begin, we'll need to start from the basics, so let's get started!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref1\"></a>\n",
"<h2 align=center> Simple Math in R </h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's say you want to watch *Fight Club* and *Star Wars: Episode IV (1977)*, back-to-back. Do you have enough time to **watch both movies in 4 hours?** Let's try using simple math in R. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What is the **total movie length** for Fight Club and Star Wars (1977)?\n",
"- **Fight Club**: 139 min\n",
"- **Star Wars: Episode IV**: 121 min"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-success alertsuccess\" style=\"margin-top: 20px\">\n",
"**Tip**: To run the grey code cell below, click on it, and press Shift + Enter.\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"260"
],
"text/latex": [
"260"
],
"text/markdown": [
"260"
],
"text/plain": [
"[1] 260"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"139 + 121 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Great! You've determined that the total number of movie play time is **260 min**. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**What is 260 min in hours?**"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"4.33333333333333"
],
"text/latex": [
"4.33333333333333"
],
"text/markdown": [
"4.33333333333333"
],
"text/plain": [
"[1] 4.333333"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"260 / 60"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well, it looks like it's **over 4 hours**, which means you can't watch *Fight Club* and *Star Wars (1977)* back-to-back if you only have 4 hours available!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr></hr>\n",
"<div class=\"alert alert-success alertsuccess\" style=\"margin-top: 20px\">\n",
"<h4> [Tip] Simple math in R </h4>\n",
"<p></p>\n",
"You can do a variety of mathematical operations in R including: \n",
"<li> addition: **2 + 2** </li>\n",
"<li> subtraction: **5 - 2** </li>\n",
"<li> multiplication: **3 \\* 2** </li>\n",
"<li> division: **4 / 2** </li>\n",
"<li> exponentiation: **4 \\*\\* 2** or **4 ^ 2 **</li>\n",
"</div>\n",
"<hr></hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref2\"></a>\n",
"<h2 align=center> Variables in R </h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also **store** our output in **variables**, so we can use them later on. For example:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"x <- 139 + 121"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To return the value of **`x`**, we can simply run the variable as a command:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"260"
],
"text/latex": [
"260"
],
"text/markdown": [
"260"
],
"text/plain": [
"[1] 260"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also perform operations on **`x`** and save the result to a **new variable**:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"4.33333333333333"
],
"text/latex": [
"4.33333333333333"
],
"text/markdown": [
"4.33333333333333"
],
"text/plain": [
"[1] 4.333333"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"y <- x / 60\n",
"y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we save something to an **existing variable**, it will **overwrite** the previous value:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"4.33333333333333"
],
"text/latex": [
"4.33333333333333"
],
"text/markdown": [
"4.33333333333333"
],
"text/plain": [
"[1] 4.333333"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x <- x / 60\n",
"x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's good practice to use **meaningful variable names**, so you don't have to keep track of what variable is what:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"260"
],
"text/latex": [
"260"
],
"text/markdown": [
"260"
],
"text/plain": [
"[1] 260"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"total <- 139 + 121\n",
"total"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"4.33333333333333"
],
"text/latex": [
"4.33333333333333"
],
"text/markdown": [
"4.33333333333333"
],
"text/plain": [
"[1] 4.333333"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"total_hr <- total / 60\n",
"total_hr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can put this all into a single expression, but remember to use **round brackets** to add together the movie lengths first, before dividing by 60."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"4.33333333333333"
],
"text/latex": [
"4.33333333333333"
],
"text/markdown": [
"4.33333333333333"
],
"text/plain": [
"[1] 4.333333"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"total_hr <- (139 + 121) / 60\n",
"total_hr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr></hr>\n",
"<div class=\"alert alert-success alertsuccess\" style=\"margin-top: 0px\">\n",
"<h4> [Tip] Variables in R </h4>\n",
"<p></p>\n",
"As you just learned, you can use **variables** to store values for repeated use. Here are some more **characteristics of variables in R**:\n",
"<li>variables store the output of a block of code </li>\n",
"<li>variables are typically assigned using **<-**, but can also be assigned using **=**, as in **x <- 1** or **x = 1** </li>\n",
"<li>once created, variables can be removed from memory using **rm(**my_variable**)** </li>\n",
"<p></p>\n",
"</div>\n",
"<hr></hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref3\"></a>\n",
"<h2 align=center>Vectors in R</h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What if we want to know the **movie length in _hours_**, not minutes, for _Toy Story_ and for _Akira_?\n",
"- **Toy Story (1995)**: 81 min\n",
"- **Akira (1998)**: 125 min"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<ol class=list-inline>\n",
"\t<li>1.35</li>\n",
"\t<li>2.08333333333333</li>\n",
"</ol>\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1.35\n",
"\\item 2.08333333333333\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1.35\n",
"2. 2.08333333333333\n",
"\n",
"\n"
],
"text/plain": [
"[1] 1.350000 2.083333"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"c(81, 125) / 60"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you see above, we've applied a single math operation to both of the items in **`c(81, 125)`**. You can even assign **`c(81, 125)`** to a variable before performing an operation."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<ol class=list-inline>\n",
"\t<li>1.35</li>\n",
"\t<li>2.08333333333333</li>\n",
"</ol>\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1.35\n",
"\\item 2.08333333333333\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1.35\n",
"2. 2.08333333333333\n",
"\n",
"\n"
],
"text/plain": [
"[1] 1.350000 2.083333"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"ratings <- c(81, 125)\n",
"ratings / 60"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What we just did was create vectors, using the combine function **`c()`**. The **`c()`** function takes multiple items, then combines them into a **vector**. \n",
"\n",
"It's important to understand that **vectors** are used everywhere in R, and vectors are easy to use."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<ol class=list-inline>\n",
"\t<li>1</li>\n",
"\t<li>2</li>\n",
"\t<li>3</li>\n",
"\t<li>4</li>\n",
"\t<li>5</li>\n",
"\t<li>6</li>\n",
"\t<li>7</li>\n",
"\t<li>8</li>\n",
"\t<li>9</li>\n",
"\t<li>10</li>\n",
"</ol>\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1\n",
"\\item 2\n",
"\\item 3\n",
"\\item 4\n",
"\\item 5\n",
"\\item 6\n",
"\\item 7\n",
"\\item 8\n",
"\\item 9\n",
"\\item 10\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1\n",
"2. 2\n",
"3. 3\n",
"4. 4\n",
"5. 5\n",
"6. 6\n",
"7. 7\n",
"8. 8\n",
"9. 9\n",
"10. 10\n",
"\n",
"\n"
],
"text/plain": [
" [1] 1 2 3 4 5 6 7 8 9 10"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<ol class=list-inline>\n",
"\t<li>1</li>\n",
"\t<li>2</li>\n",
"\t<li>3</li>\n",
"\t<li>4</li>\n",
"\t<li>5</li>\n",
"\t<li>6</li>\n",
"\t<li>7</li>\n",
"\t<li>8</li>\n",
"\t<li>9</li>\n",
"\t<li>10</li>\n",
"</ol>\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1\n",
"\\item 2\n",
"\\item 3\n",
"\\item 4\n",
"\\item 5\n",
"\\item 6\n",
"\\item 7\n",
"\\item 8\n",
"\\item 9\n",
"\\item 10\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1\n",
"2. 2\n",
"3. 3\n",
"4. 4\n",
"5. 5\n",
"6. 6\n",
"7. 7\n",
"8. 8\n",
"9. 9\n",
"10. 10\n",
"\n",
"\n"
],
"text/plain": [
" [1] 1 2 3 4 5 6 7 8 9 10"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)\n",
"c(1:10)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<ol class=list-inline>\n",
"\t<li>10</li>\n",
"\t<li>9</li>\n",
"\t<li>8</li>\n",
"\t<li>7</li>\n",
"\t<li>6</li>\n",
"\t<li>5</li>\n",
"\t<li>4</li>\n",
"\t<li>3</li>\n",
"\t<li>2</li>\n",
"\t<li>1</li>\n",
"</ol>\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 10\n",
"\\item 9\n",
"\\item 8\n",
"\\item 7\n",
"\\item 6\n",
"\\item 5\n",
"\\item 4\n",
"\\item 3\n",
"\\item 2\n",
"\\item 1\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 10\n",
"2. 9\n",
"3. 8\n",
"4. 7\n",
"5. 6\n",
"6. 5\n",
"7. 4\n",
"8. 3\n",
"9. 2\n",
"10. 1\n",
"\n",
"\n"
],
"text/plain": [
" [1] 10 9 8 7 6 5 4 3 2 1"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"c(10:1) # 10 to 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr></hr>\n",
"<div class=\"alert alert-success alertsuccess\" style=\"margin-top: 20px\">\n",
"<h4> [Tip] # Comments</h4> \n",
"\n",
"Did you notice the **comment** after the **c(10:1)** above? Comments are very useful in describing your code. You can create your own comments by using the **#** symbol and writing your comment after it. R will interpret it as a comment, not as code.\n",
"\n",
"<p></p>\n",
"</div>\n",
"\n",
"<hr></hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id=\"ref4\"></a>\n",
"<h2 align=center>Strings in R</h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"R isn't just about numbers -- we can also have strings too. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"movie <- \"Toy Story\"\n",
"movie"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In R, you can identify **character strings** when they are wrapped with **matching double (\") or single (') quotes**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create a **character vector** for the following **genres**:\n",
"- Animation\n",
"- Comedy\n",
"- Biography\n",
"- Horror\n",
"- Romance\n",
"- Sci-fi"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<ol class=list-inline>\n",
"\t<li>'Animation'</li>\n",
"\t<li>'Comedy'</li>\n",
"\t<li>'Biography'</li>\n",
"\t<li>'Horror'</li>\n",
"\t<li>'Romance'</li>\n",
"\t<li>'Sci-fi'</li>\n",
"</ol>\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 'Animation'\n",
"\\item 'Comedy'\n",
"\\item 'Biography'\n",
"\\item 'Horror'\n",
"\\item 'Romance'\n",
"\\item 'Sci-fi'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 'Animation'\n",
"2. 'Comedy'\n",
"3. 'Biography'\n",
"4. 'Horror'\n",
"5. 'Romance'\n",
"6. 'Sci-fi'\n",
"\n",
"\n"
],
"text/plain": [
"[1] \"Animation\" \"Comedy\" \"Biography\" \"Horror\" \"Romance\" \"Sci-fi\" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"genres <- c(\"Animation\", \"Comedy\", \"Biography\", \"Horror\", \"Romance\", \"Sci-fi\")\n",
"genres"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Scaling R with big data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you learn more about R, if you are interested in exploring platforms that can help you run analyses at scale, you might want to sign up for a free account on [IBM Watson Studio](http://cocl.us/dsx_rp0101en), which allows you to run analyses in R with two Spark executors for free."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Excellent! You have just completed the R basics notebook! "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### About the Author: \n",
"Hi! It's [Marta Aghili](https://ca.linkedin.com/in/marta-aghili-2b184b71), the author of this notebook. I hope you found R easy to learn! There's lots more to learn about R but you're well on your way. Feel free to connect with me if you have any questions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n",
"Copyright &copy; [IBM Cognitive Class](https://cognitiveclass.ai). This notebook and its source code are released under the terms of the [MIT License](https://cognitiveclass.ai/mit-license/)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "conda-env-r-r"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.5.1"
},
"widgets": {
"state": {},
"version": "1.1.2"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment