Skip to content

Instantly share code, notes, and snippets.

@nharrell04
Last active May 24, 2016 18:13
Show Gist options
  • Save nharrell04/41f8df7645d7647a9d651f5382e2f518 to your computer and use it in GitHub Desktop.
Save nharrell04/41f8df7645d7647a9d651f5382e2f518 to your computer and use it in GitHub Desktop.
oztemp <- hw1_data[,hw1_data$Ozone > 31 && hw1_data$Temp > 90] ## hw1_data is the full data fram I'm working with
## I'm trying to get all of the rows where the value in the Ozone column is above 32 and the values in the Temp column are above 90
## instead oztemp is a data frame w/ 0 columns and 153 rows (there are 153 rows in the data frame)
@stevekrenzel
Copy link

Really close. I'll use the cars data set here because it comes standard in R. cars has two columns, 'speed' and 'dist'. It's some dataset related to the speed and stopping distance of cars in the 1920s.

If you're in R Studio you should be able to just type cars and see the data.

Let's grab all data for cars going faster than 20mph:

cars$speed > 30

Which gives us:

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[29] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[43] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

and we can grab all of the data we want for cars with a stopping distance of greater than 90ft:

cars$dist > 90

Giving us a different vector of:

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[29] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[43] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

So now we want to know which entries have a TRUE value in both of the above lists. In your sample you used the && operator but that only works on single values (I don't know why). (e.g. TRUE && FALSE == FALSE). To AND two lists together use the single &:

cars$speed > 20 & cars$dist > 90

Giving:

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[29] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[43] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

So now we've got a list of TRUE/FALSE values and we know that if we use this list as an index into the data, we'll get back all of the rows that correspond to TRUE:

cars[cars$speed > 20 & cars$dist > 90]

Which gives us an error:

Error in `[.data.frame`(cars, cars$speed > 20 & cars$dist > 90) : 
  undefined columns selected

We need to specify which columns we want. We want all of them so we'll just add a , after our row indices:

cars[cars$speed > 20 & cars$dist > 90,]

Giving us:

   speed dist
47    24   92
48    24   93
49    24  120

And that's it. If you prefer the non-$ syntax, you could similarly do:

cars[cars['speed'] > 20 & cars['dist'] > 90,]

And get

   speed dist
47    24   92
48    24   93
49    24  120

as well.

@stevekrenzel
Copy link

Also note that cars in the 1920s were hella unsafe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment