Skip to content

Instantly share code, notes, and snippets.

@tgirke
Created November 18, 2012 00:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tgirke/4101972 to your computer and use it in GitHub Desktop.
Save tgirke/4101972 to your computer and use it in GitHub Desktop.
test
my_frame[!duplicated(my_frame[,2]),] # Removes rows with duplicated values in selected column.
my_frame[my_frame$y2 > my_frame$y3,] # Prints all rows of data frame where values of col1 > col2. Comparison operators are: == (equal), != (not equal), >= (greater than or equal), etc. Logical operators are & (and), | (or) and ! (not).
x <- 0.5:10; x[x<1.0] <- -1/x[x<1.0] # Replaces all values in vector or data frame that are below 1 with their reciprocal value.
x <-data.frame(month=month.abb[1:12], AB=LETTERS[1:2], no1=1:48, no2=1:24); x[x$month == "Apr" & (x$no1 == x$no2 | x$no1 > x$no2),] # Prints all records of frame 'x' that contain 'Apr' AND have equal values in columns 'no1' and 'no2' OR have greater values in column 'no1'.
x[x[,1] %in% c("Jun", "Aug"),] # Retrieves rows with column matches specified in a query vector.
x[c(grep("\\d{2}", as.character(x$no1), perl = TRUE)),] # Possibility to print out all rows of a data frame where a regular expression matches (here all double digit values in col 'no1').
x[c(grep("\\d{2}", as.character(for(i in 1:4){x[,i]}), perl = TRUE)),] # Same as above, but searches all columns (1-4) using a for loop (see below).
z <- data.frame(chip1=letters[1:25], chip2=letters[25:1], chip3=letters[1:25]); z; y <- apply(z, 1, function(x) sum(x == "m") > 2); z[y,] # Identifies in a data frame ('z') all those rows that contain a certain number of identical fields (here 'm' > 2).
z <- data.frame(chip1=1:25, chip2=25:1, chip3=1:25); c <- data.frame(z, count=apply(z[,1:3], 1, FUN <- function(x) sum(x >= 5))); c # Counts in each row of a data frame the number of fields that are above or below a specified value and appends this information to the data frame. By default rows with "NA" values will be ignored. To work around this limitation, one can replace the NA fields with a value that doesn't affect the result, e.g.: x[is.na(x)] <- 1.
x <- data.frame(matrix(rep(c("P","A","M"),20),10,5)); x; index <- x == "P"; cbind(x, Pcount=rowSums(index)); x[rowSums(index)>=2,] # Example of how one can count occurances of strings across rows. In this example the occurances of "P" in a data frame of "PMA" values are counted by converting to a data frame of logical values and then counting the 'TRUE' occurences with the 'rowSums' function, which does in this example the same as: 'cbind(x, count=apply(x=="P",1,sum))'.ed to be double escaped '\\' in R.
##
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment