Skip to content

Instantly share code, notes, and snippets.

@raov5
Last active August 22, 2017 02:49
Show Gist options
  • Save raov5/61b0c9913829bd308c9cb58b1a96d94a to your computer and use it in GitHub Desktop.
Save raov5/61b0c9913829bd308c9cb58b1a96d94a to your computer and use it in GitHub Desktop.
This notebook explores the main data structures in R.
Display the source blob
Display the rendered blob
Raw
{
"nbformat_minor": 1,
"metadata": {
"language_info": {
"codemirror_mode": "r",
"pygments_lexer": "r",
"name": "R",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"version": "3.3.2"
},
"kernelspec": {
"language": "R",
"name": "r-spark20",
"display_name": "R with Spark 2.0"
}
},
"cells": [
{
"execution_count": 3,
"source": "#@author: Venky Rao raove@us.ibm.com\n#@last edited: 13 Aug 2017\n#@source: materials, data and examples adapted from R in Action 2nd Edition by Dr. Robert Kabacoff",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"source": "# Data Structures in R",
"metadata": {},
"cell_type": "markdown"
},
{
"execution_count": 31,
"source": "#the first step in data analysis is the creation of a dataset. In R, this consists of 2 steps:\n# 1. selecting a data structure to hold your data; and\n# 2. entering or importing your data into the data structure.\n#This notebook provides an overview of the key data structures in R.",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"source": "## Vectors",
"metadata": {},
"cell_type": "markdown"
},
{
"execution_count": 3,
"source": "#Vectors\n#vectors are one-dimensional arrays that can hold numeric data, character data or logical data (can only be of 1 type)\n#the combine function \"c()\" is used to form the vector",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"execution_count": 5,
"source": "#numeric vector\na <- c(1, 2, 5, 3, 6, -2, 4)\na",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 1\n\\item 2\n\\item 5\n\\item 3\n\\item 6\n\\item -2\n\\item 4\n\\end{enumerate*}\n",
"text/html": "<ol class=list-inline>\n\t<li>1</li>\n\t<li>2</li>\n\t<li>5</li>\n\t<li>3</li>\n\t<li>6</li>\n\t<li>-2</li>\n\t<li>4</li>\n</ol>\n",
"text/plain": "[1] 1 2 5 3 6 -2 4",
"text/markdown": "1. 1\n2. 2\n3. 5\n4. 3\n5. 6\n6. -2\n7. 4\n\n\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 6,
"source": "#character vector\nb <- c(\"one\", \"two\", \"three\")\nb",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 'one'\n\\item 'two'\n\\item 'three'\n\\end{enumerate*}\n",
"text/html": "<ol class=list-inline>\n\t<li>'one'</li>\n\t<li>'two'</li>\n\t<li>'three'</li>\n</ol>\n",
"text/plain": "[1] \"one\" \"two\" \"three\"",
"text/markdown": "1. 'one'\n2. 'two'\n3. 'three'\n\n\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 7,
"source": "#logical vector\nc <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)\nc",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item TRUE\n\\item TRUE\n\\item TRUE\n\\item FALSE\n\\item TRUE\n\\item FALSE\n\\end{enumerate*}\n",
"text/html": "<ol class=list-inline>\n\t<li>TRUE</li>\n\t<li>TRUE</li>\n\t<li>TRUE</li>\n\t<li>FALSE</li>\n\t<li>TRUE</li>\n\t<li>FALSE</li>\n</ol>\n",
"text/plain": "[1] TRUE TRUE TRUE FALSE TRUE FALSE",
"text/markdown": "1. TRUE\n2. TRUE\n3. TRUE\n4. FALSE\n5. TRUE\n6. FALSE\n\n\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 8,
"source": "#single-element vectors are called \"scalars\". They are used to hold constants\n#eg f<- 3; g <- \"US\"; h <- FALSE",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"execution_count": 9,
"source": "#you can refer to elements of a vector using a numeric vector of positions within brackets. for example:\na[c(2, 4)] #refers to the 2nd and 4th elements of the vector \"a\"",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 2\n\\item 3\n\\end{enumerate*}\n",
"text/markdown": "1. 2\n2. 3\n\n\n",
"text/plain": "[1] 2 3",
"text/html": "<ol class=list-inline>\n\t<li>2</li>\n\t<li>3</li>\n</ol>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 10,
"source": "#some more examples:\na <- c(\"k\", \"j\", \"h\", \"a\", \"c\", \"m\")\na[3] #should return \"h\"",
"outputs": [
{
"data": {
"text/latex": "'h'",
"text/markdown": "'h'",
"text/plain": "[1] \"h\"",
"text/html": "'h'"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 11,
"source": "a[c(1, 3, 5)] #should return \"k\", \"h\" and \"c\"",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 'k'\n\\item 'h'\n\\item 'c'\n\\end{enumerate*}\n",
"text/markdown": "1. 'k'\n2. 'h'\n3. 'c'\n\n\n",
"text/plain": "[1] \"k\" \"h\" \"c\"",
"text/html": "<ol class=list-inline>\n\t<li>'k'</li>\n\t<li>'h'</li>\n\t<li>'c'</li>\n</ol>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 12,
"source": "a[2:6] #should return all elements from \"j\" to (and including) \"m\"",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 'j'\n\\item 'h'\n\\item 'a'\n\\item 'c'\n\\item 'm'\n\\end{enumerate*}\n",
"text/markdown": "1. 'j'\n2. 'h'\n3. 'a'\n4. 'c'\n5. 'm'\n\n\n",
"text/plain": "[1] \"j\" \"h\" \"a\" \"c\" \"m\"",
"text/html": "<ol class=list-inline>\n\t<li>'j'</li>\n\t<li>'h'</li>\n\t<li>'a'</li>\n\t<li>'c'</li>\n\t<li>'m'</li>\n</ol>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"source": "## Matrices",
"metadata": {},
"cell_type": "markdown"
},
{
"execution_count": 13,
"source": "#A matrix is a 2-dimensional array in which each element is of the same type (numeric, character or logical)\n#matrices are created using the matrix() function\n#general format\n# mymatrix <- matrix(vector, nrow = number_of_rows, ncol = number_of_columns,\n# byrow = T or F (default is F), \n# dimnames = list(char_vector_rownames, char_vector_colnames) )",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"execution_count": 15,
"source": "#examples:\ny <- matrix(1:20, nrow = 5, ncol = 4) #creates a matrix with 5 rows and 4 columns. \n #populates it with the numbers 1 through 20\n #populates it by column (i.e. populates col 1 first, followed by col 2 and so on)\ny",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{llll}\n\t 1 & 6 & 11 & 16\\\\\n\t 2 & 7 & 12 & 17\\\\\n\t 3 & 8 & 13 & 18\\\\\n\t 4 & 9 & 14 & 19\\\\\n\t 5 & 10 & 15 & 20\\\\\n\\end{tabular}\n",
"text/markdown": "1. 1\n2. 2\n3. 3\n4. 4\n5. 5\n6. 6\n7. 7\n8. 8\n9. 9\n10. 10\n11. 11\n12. 12\n13. 13\n14. 14\n15. 15\n16. 16\n17. 17\n18. 18\n19. 19\n20. 20\n\n\n",
"text/plain": " [,1] [,2] [,3] [,4]\n[1,] 1 6 11 16 \n[2,] 2 7 12 17 \n[3,] 3 8 13 18 \n[4,] 4 9 14 19 \n[5,] 5 10 15 20 ",
"text/html": "<table>\n<tbody>\n\t<tr><td>1 </td><td> 6</td><td>11</td><td>16</td></tr>\n\t<tr><td>2 </td><td> 7</td><td>12</td><td>17</td></tr>\n\t<tr><td>3 </td><td> 8</td><td>13</td><td>18</td></tr>\n\t<tr><td>4 </td><td> 9</td><td>14</td><td>19</td></tr>\n\t<tr><td>5 </td><td>10</td><td>15</td><td>20</td></tr>\n</tbody>\n</table>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 16,
"source": "#another example:\nz <- matrix(1:20, nrow = 5, ncol = 4, byrow = T) #creates a matrix with 5 rows and 4 columns. \n #populates it with the numbers 1 through 20\n #populates it by row (i.e. populates row 1 first, followed by row 2 and so on)\nz",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{llll}\n\t 1 & 2 & 3 & 4\\\\\n\t 5 & 6 & 7 & 8\\\\\n\t 9 & 10 & 11 & 12\\\\\n\t 13 & 14 & 15 & 16\\\\\n\t 17 & 18 & 19 & 20\\\\\n\\end{tabular}\n",
"text/markdown": "1. 1\n2. 5\n3. 9\n4. 13\n5. 17\n6. 2\n7. 6\n8. 10\n9. 14\n10. 18\n11. 3\n12. 7\n13. 11\n14. 15\n15. 19\n16. 4\n17. 8\n18. 12\n19. 16\n20. 20\n\n\n",
"text/plain": " [,1] [,2] [,3] [,4]\n[1,] 1 2 3 4 \n[2,] 5 6 7 8 \n[3,] 9 10 11 12 \n[4,] 13 14 15 16 \n[5,] 17 18 19 20 ",
"text/html": "<table>\n<tbody>\n\t<tr><td> 1</td><td> 2</td><td> 3</td><td> 4</td></tr>\n\t<tr><td> 5</td><td> 6</td><td> 7</td><td> 8</td></tr>\n\t<tr><td> 9</td><td>10</td><td>11</td><td>12</td></tr>\n\t<tr><td>13</td><td>14</td><td>15</td><td>16</td></tr>\n\t<tr><td>17</td><td>18</td><td>19</td><td>20</td></tr>\n</tbody>\n</table>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 18,
"source": "#another example:\ncells <- c(1, 26, 24, 68) #create a vector called \"cells\"\nrnames <- c(\"R1\", \"R2\") #create a vector called \"rnames\"\ncnames <- c(\"C1\", \"C2\") #create a vector called \"cnames\"\nmymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = T, dimnames = list(rnames, cnames))\nmymatrix # 2x2 matrix, populated by row, with row and column names",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{r|ll}\n & C1 & C2\\\\\n\\hline\n\tR1 & 1 & 26\\\\\n\tR2 & 24 & 68\\\\\n\\end{tabular}\n",
"text/markdown": "1. 1\n2. 24\n3. 26\n4. 68\n\n\n",
"text/plain": " C1 C2\nR1 1 26\nR2 24 68",
"text/html": "<table>\n<thead><tr><th></th><th scope=col>C1</th><th scope=col>C2</th></tr></thead>\n<tbody>\n\t<tr><th scope=row>R1</th><td> 1</td><td>26</td></tr>\n\t<tr><th scope=row>R2</th><td>24</td><td>68</td></tr>\n</tbody>\n</table>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 20,
"source": "#another example this time populated by columns:\ncells <- c(1, 26, 24, 68) #create a vector called \"cells\"\nrnames <- c(\"R1\", \"R2\") #create a vector called \"rnames\"\ncnames <- c(\"C1\", \"C2\") #create a vector called \"cnames\"\nmymatrix <- matrix(cells, nrow = 2, ncol = 2, dimnames = list(rnames, cnames))\nmymatrix # 2x2 matrix, populated by columns, with row and column names",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{r|ll}\n & C1 & C2\\\\\n\\hline\n\tR1 & 1 & 24\\\\\n\tR2 & 26 & 68\\\\\n\\end{tabular}\n",
"text/markdown": "1. 1\n2. 26\n3. 24\n4. 68\n\n\n",
"text/plain": " C1 C2\nR1 1 24\nR2 26 68",
"text/html": "<table>\n<thead><tr><th></th><th scope=col>C1</th><th scope=col>C2</th></tr></thead>\n<tbody>\n\t<tr><th scope=row>R1</th><td> 1</td><td>24</td></tr>\n\t<tr><th scope=row>R2</th><td>26</td><td>68</td></tr>\n</tbody>\n</table>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 21,
"source": "#you can identify rows, columns and elements of a matrix by using brackets and subscripts:\n# ith row of a matrix X, would be X[i,]\n# jth row of a matrix X, would be X[,j]\n#ijth element of a matrix X, would be X[i,j]",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"execution_count": 22,
"source": "#example:\nx <- matrix(1:10, nrow = 2) #2 row matrix populated by column with the numbers 1 through 10\nx",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{lllll}\n\t 1 & 3 & 5 & 7 & 9\\\\\n\t 2 & 4 & 6 & 8 & 10\\\\\n\\end{tabular}\n",
"text/markdown": "1. 1\n2. 2\n3. 3\n4. 4\n5. 5\n6. 6\n7. 7\n8. 8\n9. 9\n10. 10\n\n\n",
"text/plain": " [,1] [,2] [,3] [,4] [,5]\n[1,] 1 3 5 7 9 \n[2,] 2 4 6 8 10 ",
"text/html": "<table>\n<tbody>\n\t<tr><td>1 </td><td>3 </td><td>5 </td><td>7 </td><td> 9</td></tr>\n\t<tr><td>2 </td><td>4 </td><td>6 </td><td>8 </td><td>10</td></tr>\n</tbody>\n</table>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 23,
"source": "x[2,] #2nd row of x",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 2\n\\item 4\n\\item 6\n\\item 8\n\\item 10\n\\end{enumerate*}\n",
"text/markdown": "1. 2\n2. 4\n3. 6\n4. 8\n5. 10\n\n\n",
"text/plain": "[1] 2 4 6 8 10",
"text/html": "<ol class=list-inline>\n\t<li>2</li>\n\t<li>4</li>\n\t<li>6</li>\n\t<li>8</li>\n\t<li>10</li>\n</ol>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 24,
"source": "x[,5] #5th column of x",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 9\n\\item 10\n\\end{enumerate*}\n",
"text/markdown": "1. 9\n2. 10\n\n\n",
"text/plain": "[1] 9 10",
"text/html": "<ol class=list-inline>\n\t<li>9</li>\n\t<li>10</li>\n</ol>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 25,
"source": "x[1, 4] #element in the 1st row and 4th column (7)",
"outputs": [
{
"data": {
"text/latex": "7",
"text/markdown": "7",
"text/plain": "[1] 7",
"text/html": "7"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 26,
"source": "x[1, c(4,5)] #elements 4 and 5 in the first row. element 4 in row 1 = 7, element 5 in row 1 = 9",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 7\n\\item 9\n\\end{enumerate*}\n",
"text/markdown": "1. 7\n2. 9\n\n\n",
"text/plain": "[1] 7 9",
"text/html": "<ol class=list-inline>\n\t<li>7</li>\n\t<li>9</li>\n</ol>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 27,
"source": "#matrices are 2-dimensional but like vectors, can only contain elements of one type\n#if you need more than 2 dimensions, use arrays\n#if you need more than elements of one type, use data frames",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"source": "## Arrays",
"metadata": {},
"cell_type": "markdown"
},
{
"execution_count": 28,
"source": "#Arrays are similar to vectors but can have more than 2-dimensions\n#they are created using the array() function; the general syntax is as follows:\n# myarray <- array(vector, dimensions, dimnames)\n# where vector = data for the array; dimensions = numeric vector giving the max index for each dimension; and\n# dimnames = optional list of dimension labels",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"execution_count": 31,
"source": "#example: 3 dimensional array of numbers\ndim1 <- c(\"A1\", \"A2\")\ndim2 <- c(\"B1\", \"B2\", \"B3\")\ndim3 <- c(\"C1\", \"C2\", \"C3\", \"C4\")\nz <- array(1:24, c(2, 3, 4), dimnames = list(dim1, dim2, dim3))\nz",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 1\n\\item 2\n\\item 3\n\\item 4\n\\item 5\n\\item 6\n\\item 7\n\\item 8\n\\item 9\n\\item 10\n\\item 11\n\\item 12\n\\item 13\n\\item 14\n\\item 15\n\\item 16\n\\item 17\n\\item 18\n\\item 19\n\\item 20\n\\item 21\n\\item 22\n\\item 23\n\\item 24\n\\end{enumerate*}\n",
"text/markdown": "1. 1\n2. 2\n3. 3\n4. 4\n5. 5\n6. 6\n7. 7\n8. 8\n9. 9\n10. 10\n11. 11\n12. 12\n13. 13\n14. 14\n15. 15\n16. 16\n17. 17\n18. 18\n19. 19\n20. 20\n21. 21\n22. 22\n23. 23\n24. 24\n\n\n",
"text/plain": ", , C1\n\n B1 B2 B3\nA1 1 3 5\nA2 2 4 6\n\n, , C2\n\n B1 B2 B3\nA1 7 9 11\nA2 8 10 12\n\n, , C3\n\n B1 B2 B3\nA1 13 15 17\nA2 14 16 18\n\n, , C4\n\n B1 B2 B3\nA1 19 21 23\nA2 20 22 24\n",
"text/html": "<ol class=list-inline>\n\t<li>1</li>\n\t<li>2</li>\n\t<li>3</li>\n\t<li>4</li>\n\t<li>5</li>\n\t<li>6</li>\n\t<li>7</li>\n\t<li>8</li>\n\t<li>9</li>\n\t<li>10</li>\n\t<li>11</li>\n\t<li>12</li>\n\t<li>13</li>\n\t<li>14</li>\n\t<li>15</li>\n\t<li>16</li>\n\t<li>17</li>\n\t<li>18</li>\n\t<li>19</li>\n\t<li>20</li>\n\t<li>21</li>\n\t<li>22</li>\n\t<li>23</li>\n\t<li>24</li>\n</ol>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"source": "## Data frames",
"metadata": {},
"cell_type": "markdown"
},
{
"execution_count": 35,
"source": "#Data frames are like tables of data that you might see in Microsoft Excel or SPSS.\n#They can contain different types of data (numeric, character, logical)\n#data frames are created using the data.frame() function\n# mydataframe <- data.frame(col1, col2, col3, ...) where col1, col2, col3 and so on are column vectors of any data type\n# names for each column can be provided with the \"names\" function",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"execution_count": 36,
"source": "#creating a data frame: an example\npatientID <- c(1, 2, 3, 4)\nage <- c(25, 34, 28, 52)\ndiabetes <- c(\"Type1\", \"Type2\", \"Type1\", \"Type1\")\nstatus <- c(\"Poor\", \"Improved\", \"Excellent\", \"Poor\")\npatientdata <- data.frame(patientID, age, diabetes, status)\npatientdata",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{r|llll}\n patientID & age & diabetes & status\\\\\n\\hline\n\t 1 & 25 & Type1 & Poor \\\\\n\t 2 & 34 & Type2 & Improved \\\\\n\t 3 & 28 & Type1 & Excellent\\\\\n\t 4 & 52 & Type1 & Poor \\\\\n\\end{tabular}\n",
"text/html": "<table>\n<thead><tr><th scope=col>patientID</th><th scope=col>age</th><th scope=col>diabetes</th><th scope=col>status</th></tr></thead>\n<tbody>\n\t<tr><td>1 </td><td>25 </td><td>Type1 </td><td>Poor </td></tr>\n\t<tr><td>2 </td><td>34 </td><td>Type2 </td><td>Improved </td></tr>\n\t<tr><td>3 </td><td>28 </td><td>Type1 </td><td>Excellent</td></tr>\n\t<tr><td>4 </td><td>52 </td><td>Type1 </td><td>Poor </td></tr>\n</tbody>\n</table>\n",
"text/plain": " patientID age diabetes status \n1 1 25 Type1 Poor \n2 2 34 Type2 Improved \n3 3 28 Type1 Excellent\n4 4 52 Type1 Poor "
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 38,
"source": "#specifying elements of a data frame\npatientdata[1:2] # returns the first 2 columns of data",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{r|ll}\n patientID & age\\\\\n\\hline\n\t 1 & 25\\\\\n\t 2 & 34\\\\\n\t 3 & 28\\\\\n\t 4 & 52\\\\\n\\end{tabular}\n",
"text/html": "<table>\n<thead><tr><th scope=col>patientID</th><th scope=col>age</th></tr></thead>\n<tbody>\n\t<tr><td>1 </td><td>25</td></tr>\n\t<tr><td>2 </td><td>34</td></tr>\n\t<tr><td>3 </td><td>28</td></tr>\n\t<tr><td>4 </td><td>52</td></tr>\n</tbody>\n</table>\n",
"text/plain": " patientID age\n1 1 25 \n2 2 34 \n3 3 28 \n4 4 52 "
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 39,
"source": "patientdata[c(\"diabetes\", \"status\")] #returns data for the columns whose names are specified",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{r|ll}\n diabetes & status\\\\\n\\hline\n\t Type1 & Poor \\\\\n\t Type2 & Improved \\\\\n\t Type1 & Excellent\\\\\n\t Type1 & Poor \\\\\n\\end{tabular}\n",
"text/html": "<table>\n<thead><tr><th scope=col>diabetes</th><th scope=col>status</th></tr></thead>\n<tbody>\n\t<tr><td>Type1 </td><td>Poor </td></tr>\n\t<tr><td>Type2 </td><td>Improved </td></tr>\n\t<tr><td>Type1 </td><td>Excellent</td></tr>\n\t<tr><td>Type1 </td><td>Poor </td></tr>\n</tbody>\n</table>\n",
"text/plain": " diabetes status \n1 Type1 Poor \n2 Type2 Improved \n3 Type1 Excellent\n4 Type1 Poor "
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 40,
"source": "patientdata$age #returns the age variable in the data frame",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 25\n\\item 34\n\\item 28\n\\item 52\n\\end{enumerate*}\n",
"text/markdown": "1. 25\n2. 34\n3. 28\n4. 52\n\n\n",
"text/plain": "[1] 25 34 28 52",
"text/html": "<ol class=list-inline>\n\t<li>25</li>\n\t<li>34</li>\n\t<li>28</li>\n\t<li>52</li>\n</ol>\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 41,
"source": "#the $ notation is used to indicate a particular variable from a given data frame.\n#for example, if you want to tabulate the diabetes types with the status, you can do it as follows:\ntable(patientdata$diabetes, patientdata$status)",
"outputs": [
{
"data": {
"text/plain": " \n Excellent Improved Poor\n Type1 1 0 2\n Type2 0 1 0"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 43,
"source": "#typing patientdata$ can get tiresome so short cuts are available\n#you can use attach(), detach() or with() to simplify your code\n#for example:\nsummary(mtcars$mpg)\nplot(mtcars$mpg, mtcars$disp)\nplot(mtcars$mpg, mtcars$wt)\n#can also be written as:\nattach(mtcars)\n summary(mpg)\n plot(mpg, disp)\n plot(mpg, wt)\ndetach(mtcars)\n#can also be written as:\nwith(mtcars, {\n print(summary(mpg))\n plot(mpg, disp)\n plot(mpg, wt)\n})",
"outputs": [
{
"data": {
"text/plain": " Min. 1st Qu. Median Mean 3rd Qu. Max. \n 10.40 15.42 19.20 20.09 22.80 33.90 "
},
"output_type": "display_data",
"metadata": {}
},
{
"data": {
"image/png": ""
},
"output_type": "display_data",
"metadata": {}
},
{
"data": {
"text/plain": " Min. 1st Qu. Median Mean 3rd Qu. Max. \n 10.40 15.42 19.20 20.09 22.80 33.90 "
},
"output_type": "display_data",
"metadata": {}
},
{
"data": {
"image/png": ""
},
"output_type": "display_data",
"metadata": {}
},
{
"data": {
"image/png": ""
},
"output_type": "display_data",
"metadata": {}
},
{
"name": "stdout",
"output_type": "stream",
"text": " Min. 1st Qu. Median Mean 3rd Qu. Max. \n 10.40 15.42 19.20 20.09 22.80 33.90 \n"
},
{
"data": {
"image/png": ""
},
"output_type": "display_data",
"metadata": {}
},
{
"data": {
"image/png": ""
},
"output_type": "display_data",
"metadata": {}
},
{
"data": {
"image/png": ""
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"source": "## Factors",
"metadata": {},
"cell_type": "markdown"
},
{
"execution_count": 1,
"source": "#Variables can be described as nominal, ordinal or continuous\n#nominal variables are categorical with no implied order. eg Diabetes (Type 1, Type 2)\n#ordinal variables are categorical that imply order but no amount. eg Status (poor, improved, excellent)\n#continuous variables can take on any value in a given range - they imply both order and amount. eg age (14.5, 22.8, etc)",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"execution_count": 2,
"source": "#categorial variables (nominal and ordinal) are called Factors in R\n#let us see some examples of this below",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"execution_count": 8,
"source": "diabetes <- c(\"Type1\", \"Type2\", \"Type1\", \"Type1\") #vector called Diabetes created using the combine c() function\ndiabetes <- factor(diabetes) #converts Diabetes into a Factor\ndiabetes # Type1 is coded as 1; Type2 is coded as 2. The assignment is alphabetical",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item Type1\n\\item Type2\n\\item Type1\n\\item Type1\n\\end{enumerate*}\n",
"text/html": "<ol class=list-inline>\n\t<li>Type1</li>\n\t<li>Type2</li>\n\t<li>Type1</li>\n\t<li>Type1</li>\n</ol>\n",
"text/plain": "[1] Type1 Type2 Type1 Type1\nLevels: Type1 Type2",
"text/markdown": "1. Type1\n2. Type2\n3. Type1\n4. Type1\n\n\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 9,
"source": "str(diabetes) #str() function returns the structure of an argument passed in ()\n#you will see that str(diabetes) returns 1 2 1 1",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Factor w/ 2 levels \"Type1\",\"Type2\": 1 2 1 1\n"
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 12,
"source": "#for factors represrnting ordinal variables, you add the parameter ordered = TRUE and the levels() function to the factor() function\nstatus <- c(\"Poor\", \"Improved\", \"Excellent\", \"Poor\")\nstatus <- factor(status, ordered = T, level = c(\"Poor\", \"Improved\", \"Excellent\"))\nstatus",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item Poor\n\\item Improved\n\\item Excellent\n\\item Poor\n\\end{enumerate*}\n",
"text/html": "<ol class=list-inline>\n\t<li>Poor</li>\n\t<li>Improved</li>\n\t<li>Excellent</li>\n\t<li>Poor</li>\n</ol>\n",
"text/plain": "[1] Poor Improved Excellent Poor \nLevels: Poor < Improved < Excellent",
"text/markdown": "1. Poor\n2. Improved\n3. Excellent\n4. Poor\n\n\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 13,
"source": "str(status)",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": " Ord.factor w/ 3 levels \"Poor\"<\"Improved\"<..: 1 2 3 1\n"
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 14,
"source": "#numeric values can be coded as factors using the levels and labels options\n#for example if gender was coded as 1 for male and 2 for female, then you could convert it into a factor as follows:\ngender <- c(1, 2)\ngender <- factor(gender, levels = c(1, 2), labels = c(\"Male\", \"Female\"))\ngender",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item Male\n\\item Female\n\\end{enumerate*}\n",
"text/html": "<ol class=list-inline>\n\t<li>Male</li>\n\t<li>Female</li>\n</ol>\n",
"text/plain": "[1] Male Female\nLevels: Male Female",
"text/markdown": "1. Male\n2. Female\n\n\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 15,
"source": "#some more examples:\npatientID <- c(1, 2, 3, 4)\nage <- c(25, 34, 28, 52)\ndiabetes <- c(\"Type1\", \"Type2\", \"Type1\", \"Type1\")\nstatus <- c(\"Poor\", \"Improved\", \"Excellent\", \"Poor\")\ndiabetes <- factor(diabetes)\nstatus <- factor(status, ordered = T, levels = c(\"Poor\", \"Improved\", \"Excellent\"))\npatientData <- data.frame(patientID, age, diabetes, status)\npatientData",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{r|llll}\n patientID & age & diabetes & status\\\\\n\\hline\n\t 1 & 25 & Type1 & Poor \\\\\n\t 2 & 34 & Type2 & Improved \\\\\n\t 3 & 28 & Type1 & Excellent\\\\\n\t 4 & 52 & Type1 & Poor \\\\\n\\end{tabular}\n",
"text/html": "<table>\n<thead><tr><th scope=col>patientID</th><th scope=col>age</th><th scope=col>diabetes</th><th scope=col>status</th></tr></thead>\n<tbody>\n\t<tr><td>1 </td><td>25 </td><td>Type1 </td><td>Poor </td></tr>\n\t<tr><td>2 </td><td>34 </td><td>Type2 </td><td>Improved </td></tr>\n\t<tr><td>3 </td><td>28 </td><td>Type1 </td><td>Excellent</td></tr>\n\t<tr><td>4 </td><td>52 </td><td>Type1 </td><td>Poor </td></tr>\n</tbody>\n</table>\n",
"text/plain": " patientID age diabetes status \n1 1 25 Type1 Poor \n2 2 34 Type2 Improved \n3 3 28 Type1 Excellent\n4 4 52 Type1 Poor "
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 16,
"source": "str(patientData)",
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "'data.frame':\t4 obs. of 4 variables:\n $ patientID: num 1 2 3 4\n $ age : num 25 34 28 52\n $ diabetes : Factor w/ 2 levels \"Type1\",\"Type2\": 1 2 1 1\n $ status : Ord.factor w/ 3 levels \"Poor\"<\"Improved\"<..: 1 2 3 1\n"
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 17,
"source": "summary(patientData)",
"outputs": [
{
"data": {
"text/plain": " patientID age diabetes status \n Min. :1.00 Min. :25.00 Type1:3 Poor :2 \n 1st Qu.:1.75 1st Qu.:27.25 Type2:1 Improved :1 \n Median :2.50 Median :31.00 Excellent:1 \n Mean :2.50 Mean :34.75 \n 3rd Qu.:3.25 3rd Qu.:38.50 \n Max. :4.00 Max. :52.00 "
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"source": "## Lists",
"metadata": {},
"cell_type": "markdown"
},
{
"execution_count": 18,
"source": "#Lists are the most complex of the R data types. A list is an odered collection of objects of any type\n# you can create a list using the list() function as follows:\n# mylist <- list(object1, object2, ...)\n# or you could name the objects in a list like: mylist <- list(name1 = object1, name2 = object2, ...)\n#let us look at some examples below:",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
},
{
"execution_count": 19,
"source": "g <- \"My First List\"\nh <- c(25, 26, 18, 39)\nj <- matrix(1:10, nrow = 5)\nk <- c(\"one\", \"two\", \"three\")\nmylist <- list(title = g, ages = h, j, k)\nmylist",
"outputs": [
{
"data": {
"text/plain": "$title\n[1] \"My First List\"\n\n$ages\n[1] 25 26 18 39\n\n[[3]]\n [,1] [,2]\n[1,] 1 6\n[2,] 2 7\n[3,] 3 8\n[4,] 4 9\n[5,] 5 10\n\n[[4]]\n[1] \"one\" \"two\" \"three\"\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 20,
"source": "mylist$title",
"outputs": [
{
"data": {
"text/latex": "'My First List'",
"text/html": "'My First List'",
"text/plain": "[1] \"My First List\"",
"text/markdown": "'My First List'"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 21,
"source": "mylist$ages",
"outputs": [
{
"data": {
"text/latex": "\\begin{enumerate*}\n\\item 25\n\\item 26\n\\item 18\n\\item 39\n\\end{enumerate*}\n",
"text/html": "<ol class=list-inline>\n\t<li>25</li>\n\t<li>26</li>\n\t<li>18</li>\n\t<li>39</li>\n</ol>\n",
"text/plain": "[1] 25 26 18 39",
"text/markdown": "1. 25\n2. 26\n3. 18\n4. 39\n\n\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 22,
"source": "mylist[[1]]",
"outputs": [
{
"data": {
"text/latex": "'My First List'",
"text/html": "'My First List'",
"text/plain": "[1] \"My First List\"",
"text/markdown": "'My First List'"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 23,
"source": "mylist[[3]]",
"outputs": [
{
"data": {
"text/latex": "\\begin{tabular}{ll}\n\t 1 & 6\\\\\n\t 2 & 7\\\\\n\t 3 & 8\\\\\n\t 4 & 9\\\\\n\t 5 & 10\\\\\n\\end{tabular}\n",
"text/html": "<table>\n<tbody>\n\t<tr><td>1 </td><td> 6</td></tr>\n\t<tr><td>2 </td><td> 7</td></tr>\n\t<tr><td>3 </td><td> 8</td></tr>\n\t<tr><td>4 </td><td> 9</td></tr>\n\t<tr><td>5 </td><td>10</td></tr>\n</tbody>\n</table>\n",
"text/plain": " [,1] [,2]\n[1,] 1 6 \n[2,] 2 7 \n[3,] 3 8 \n[4,] 4 9 \n[5,] 5 10 ",
"text/markdown": "1. 1\n2. 2\n3. 3\n4. 4\n5. 5\n6. 6\n7. 7\n8. 8\n9. 9\n10. 10\n\n\n"
},
"output_type": "display_data",
"metadata": {}
}
],
"cell_type": "code",
"metadata": {}
},
{
"execution_count": 24,
"source": "#Lists are important for 2 reasons:\n#1. they allow you to organize and recall disparate information in a simple way\n#2. the results of many R functions return lists and it is upto you to pull out the components needed",
"outputs": [],
"cell_type": "code",
"metadata": {
"collapsed": true
}
}
],
"nbformat": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment