Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save charanhu/ab643648570e69bf559782f7e4eb5607 to your computer and use it in GitHub Desktop.
Save charanhu/ab643648570e69bf559782f7e4eb5607 to your computer and use it in GitHub Desktop.
Selection: 6
|
| | 0%
| In this lesson, we'll see how to extract elements from a vector based on some
| conditions that we specify.
...
|
|== | 3%
| For example, we may only be interested in the first 20 elements of a vector,
| or only the elements that are not NA, or only those that are positive or
| correspond to a specific variable of interest. By the end of this lesson,
| you'll know how to handle each of these scenarios.
...
|
|==== | 5%
| I've created for you a vector called x that contains a random ordering of 20
| numbers (from a standard normal distribution) and 20 NAs. Type x now to see
| what it looks like.
> x
[1] -0.85068248 -1.48563986 -0.91598703 -0.23367937 NA 0.63801302
[7] NA NA NA -0.02668931 NA NA
[13] 1.30983997 -3.51235543 NA -0.52339350 NA NA
[19] NA -0.71988909 NA NA NA NA
[25] NA -0.96860318 NA NA NA -0.09162274
[31] 1.57282019 -0.46906331 NA 1.52285277 0.22627761 1.32663565
[37] 0.15567008 NA -0.21661389 -0.71730483
| That's the answer I was looking for.
|
|===== | 8%
| The way you tell R that you want to select some particular elements (i.e. a
| 'subset') from a vector is by placing an 'index vector' in square brackets
| immediately following the name of the vector.
...
|
|======= | 10%
| For a simple example, try x[1:10] to view the first ten elements of x.
> x{1:10}
Error: unexpected '{' in "x{"
> x{1:10}
Error: unexpected '{' in "x{"
> x(1:10)
Error: could not find function "x"
> x[1:10]
[1] -0.85068248 -1.48563986 -0.91598703 -0.23367937 NA 0.63801302
[7] NA NA NA -0.02668931
| Excellent job!
|
|========= | 13%
| Index vectors come in four different flavors -- logical vectors, vectors of
| positive integers, vectors of negative integers, and vectors of character
| strings -- each of which we'll cover in this lesson.
...
|
|=========== | 15%
| Let's start by indexing with logical vectors. One common scenario when
| working with real-world data is that we want to extract all elements of a
| vector that are not NA (i.e. missing data). Recall that is.na(x) yields a
| vector of logical values the same length as x, with TRUEs corresponding to NA
| values in x and FALSEs corresponding to non-NA values in x.
...
|
|============= | 18%
| What do you think x[is.na(x)] will give you?
1: A vector of TRUEs and FALSEs
2: A vector with no NAs
3: A vector of all NAs
4: A vector of length 0
Selection: 3
| That's correct!
|
|============== | 21%
| Prove it to yourself by typing x[is.na(x)].
> x[is.na(x)]
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
| Your dedication is inspiring!
|
|================ | 23%
| Recall that `!` gives us the negation of a logical expression, so !is.na(x)
| can be read as 'is not NA'. Therefore, if we want to create a vector called y
| that contains all of the non-NA values from x, we can use y <- x[!is.na(x)].
| Give it a try.
> y <- x[!is.na(x)]
| You got it right!
|
|================== | 26%
| Print y to the console.
> y
[1] -0.85068248 -1.48563986 -0.91598703 -0.23367937 0.63801302 -0.02668931
[7] 1.30983997 -3.51235543 -0.52339350 -0.71988909 -0.96860318 -0.09162274
[13] 1.57282019 -0.46906331 1.52285277 0.22627761 1.32663565 0.15567008
[19] -0.21661389 -0.71730483
| All that hard work is paying off!
|
|==================== | 28%
| Now that we've isolated the non-missing values of x and put them in y, we can
| subset y as we please.
...
|
|====================== | 31%
| Recall that the expression y > 0 will give us a vector of logical values the
| same length as y, with TRUEs corresponding to values of y that are greater
| than zero and FALSEs corresponding to values of y that are less than or equal
| to zero. What do you think y[y > 0] will give you?
1: A vector of all the positive elements of y
2: A vector of TRUEs and FALSEs
3: A vector of all NAs
4: A vector of length 0
5: A vector of all the negative elements of y
Selection: 1
| You are amazing!
|
|======================= | 33%
| Type y[y > 0] to see that we get all of the positive elements of y, which are
| also the positive elements of our original vector x.
> y[y > 0]
[1] 0.6380130 1.3098400 1.5728202 1.5228528 0.2262776 1.3266356 0.1556701
| Keep up the great work!
|
|========================= | 36%
| You might wonder why we didn't just start with x[x > 0] to isolate the
| positive elements of x. Try that now to see why.
> x[x > 0]
[1] NA 0.6380130 NA NA NA NA NA
[8] 1.3098400 NA NA NA NA NA NA
[15] NA NA NA NA NA NA 1.5728202
[22] NA 1.5228528 0.2262776 1.3266356 0.1556701 NA
| Excellent job!
|
|=========================== | 38%
| Since NA is not a value, but rather a placeholder for an unknown quantity,
| the expression NA > 0 evaluates to NA. Hence we get a bunch of NAs mixed in
| with our positive numbers when we do this.
...
|
|============================= | 41%
| Combining our knowledge of logical operators with our new knowledge of
| subsetting, we could do this -- x[!is.na(x) & x > 0]. Try it out.
> x[!is.na(x) & x > 0]
[1] 0.6380130 1.3098400 1.5728202 1.5228528 0.2262776 1.3266356 0.1556701
| Great job!
|
|=============================== | 44%
| In this case, we request only values of x that are both non-missing AND
| greater than zero.
...
|
|================================ | 46%
| I've already shown you how to subset just the first ten values of x using
| x[1:10]. In this case, we're providing a vector of positive integers inside
| of the square brackets, which tells R to return only the elements of x
| numbered 1 through 10.
...
|
|================================== | 49%
| Many programming languages use what's called 'zero-based indexing', which
| means that the first element of a vector is considered element 0. R uses
| 'one-based indexing', which (you guessed it!) means the first element of a
| vector is considered element 1.
...
|
|==================================== | 51%
| Can you figure out how we'd subset the 3rd, 5th, and 7th elements of x? Hint
| -- Use the c() function to specify the element numbers as a numeric vector.
> c(3, 5, 7)
[1] 3 5 7
| One more time. You can do it! Or, type info() for more options.
| Create a vector of indexes with c(3, 5, 7), then put that inside of the
| square brackets.
> x<- c(3, 5, 7)
| Give it another try. Or, type info() for more options.
| Create a vector of indexes with c(3, 5, 7), then put that inside of the
| square brackets.
> [x]]
Error: unexpected '[' in "["
> [x]
Error: unexpected '[' in "["
> x[c(3, 5, 7)]
[1] 7 NA NA
| You got it!
|
|====================================== | 54%
| It's important that when using integer vectors to subset our vector x, we
| stick with the set of indexes {1, 2, ..., 40} since x only has 40 elements.
| What happens if we ask for the zeroth element of x (i.e. x[0])? Give it a
| try.
> x[0]
numeric(0)
| You are doing so well!
|
|======================================= | 56%
| As you might expect, we get nothing useful. Unfortunately, R doesn't prevent
| us from doing this. What if we ask for the 3000th element of x? Try it out.
> x[3000]
[1] NA
| You're the best!
|
|========================================= | 59%
| Again, nothing useful, but R doesn't prevent us from asking for it. This
| should be a cautionary tale. You should always make sure that what you are
| asking for is within the bounds of the vector you're working with.
...
|
|=========================================== | 62%
| What if we're interested in all elements of x EXCEPT the 2nd and 10th? It
| would be pretty tedious to construct a vector containing all numbers 1
| through 40 EXCEPT 2 and 10.
...[x1:40, -2, -10]
|
|============================================= | 64%
| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us
| ONLY the 2nd and 10th elements of x, x[c(-2, -10)] gives us all elements of x
| EXCEPT for the 2nd and 10 elements. Try x[c(-2, -10)] now to see this.
> x[c(-2, -10)]
[1] 3 7
| You nailed it! Good job!
|
|=============================================== | 67%
| A shorthand way of specifying multiple negative numbers is to put the
| negative sign out in front of the vector of positive numbers. Type x[-c(2,
| 10)] to get the exact same result.
>
> x[-c(2,
+ 10)]
[1] 3 7
| Excellent work!
|
|================================================ | 69%
| So far, we've covered three types of index vectors -- logical, positive
| integer, and negative integer. The only remaining type requires us to
| introduce the concept of 'named' elements.
...
|
|================================================== | 72%
| Create a numeric vector with three named elements using vect <- c(foo = 11,
| bar = 2, norf = NA).
> vect <- c(foo = 11,
+ bar = 2, norf = NA)
| Keep working like that and you'll get there!
|
|==================================================== | 74%
| When we print vect to the console, you'll see that each element has a name.
| Try it out.
>
>
> vect <- c(foo = 11, bar = 2, norf = NA)
| You almost had it, but not quite. Try again. Or, type info() for more
| options.
| Type vect to view its contents.
> vect <- c(foo = 11, bar = 2, norf = NA)
| That's not the answer I was looking for, but try again. Or, type info() for
| more options.
| Type vect to view its contents.
> vect <-c(foo = 11, bar = 2, norf = NA)
| That's not the answer I was looking for, but try again. Or, type info() for
| more options.
| Type vect to view its contents.
> vect <- c(foo = 11, bar = 2, norf = NA)')
+ vect <- c(foo = 11, bar = 2, norf = NA)')
Error: unexpected string constant in:
"vect <- c(foo = 11, bar = 2, norf = NA)')
vect <- c(foo = 11, bar = 2, norf = NA)'"
> vect <- c(foo = 11, bar = 2, norf = NA)
| Almost! Try again. Or, type info() for more options.
| Type vect to view its contents.
> vect <- c(foo= 11, bar= 2, norf= NA)
| Not quite right, but keep trying. Or, type info() for more options.
| Type vect to view its contents.
> vect<-c(foo= 11, bar= 2, norf= NA)
| Almost! Try again. Or, type info() for more options.
| Type vect to view its contents.
> vect <-c(foo= 11, bar= 2, norf= NA)
| You're close...I can feel it! Try it again. Or, type info() for more options.
| Type vect to view its contents.
> vect <-c(foo = 11, bar = 2, norf = NA)
| Almost! Try again. Or, type info() for more options.
| Type vect to view its contents.
> vect <- c(foo = 11, bar = 2, norf = NA)
| One more time. You can do it! Or, type info() for more options.
| Type vect to view its contents.
>
> x[!is.na(x) & x > 0]
[1] 3 5 7
> vect[c(foo = 11, bar = 2, norf = NA)]
<NA> bar <NA>
NA 2 NA
| You almost had it, but not quite. Try again. Or, type info() for more
| options.
| Type vect to view its contents.
> vect
foo bar norf
11 2 NA
| Your dedication is inspiring!
|
|====================================================== | 77%
| We can also get the names of vect by passing vect as an argument to the
| names() function. Give that a try.
> name(vect)
Error: could not find function "name"
> names(vect)
[1] "foo" "bar" "norf"
| You're the best!
|
|======================================================== | 79%
| Alternatively, we can create an unnamed vector vect2 with c(11, 2, NA). Do
| that now.
> vect2[c(11,2,NA)]
Error: object 'vect2' not found
> vect2 <- c(11, 2, NA)
| Excellent work!
|
|========================================================= | 82%
| Then, we can add the `names` attribute to vect2 after the fact with
| names(vect2) <- c("foo", "bar", "norf"). Go ahead.
> names(vect2) <- c("foo", "bar", "norf")
| That's correct!
|
|=========================================================== | 85%
| Now, let's check that vect and vect2 are the same by passing them as
| arguments to the identical() function.
> identical(vect, vect2)
[1] TRUE
| You got it right!
|
|============================================================= | 87%
| Indeed, vect and vect2 are identical named vectors.
...
|
|=============================================================== | 90%
| Now, back to the matter of subsetting a vector by named elements. Which of
| the following commands do you think would give us the second element of vect?
1: vect[bar]
2: vect["2"]
3: vect["bar"]
Selection: 2
| Not quite, but you're learning! Try again.
| If we want the element named "bar" (i.e. the second element of vect), which
| command would get us that?
1: vect["2"]
2: vect["bar"]
3: vect[bar]
Selection: 2
| That's the answer I was looking for.
|
|================================================================= | 92%
| Now, try it out.
> vect["bar"]
bar
2
| Your dedication is inspiring!
|
|================================================================== | 95%
| Likewise, we can specify a vector of names with vect[c("foo", "bar")]. Try it
| out.
> vect[c("foo", "bar")]
foo bar
11 2
| You nailed it! Good job!
|
|==================================================================== | 97%
| Now you know all four methods of subsetting data from vectors. Different
| approaches are best in different scenarios and when in doubt, try it out!
...
|
|======================================================================| 100%
| Would you like to receive credit for completing this course on Coursera.org?
1: Yes
2: No
Selection: 1
What is your email address? siby.thomas@capgemini.com
What is your assignment token? BMrMEZf4ICtZTRCq
Grade submission succeeded!
| You got it!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment