Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Simpler logical subsetting by strings with `grep`
```{r, echo=FALSE}
library(magrittr)
```
# `%grep%` operator
Using `R`'s built-in `grep` function is really inconvenient for interactive work.
There already exists a convenient `%in%` operator for testing membership in a sequence.
However, real data analysis rarely presents with well-defined sequences.
Strings are much more common.
## Implementation
```{r}
`%grep%` <- function(pattern, x) grepl(pattern, x)
```
That's all.
## Usage
The `%grep%` operator functions the same way that `%in%` does, but the lookup is within a string.
Direct string matching:
```{r}
# Select 'setosa' entries
iris['setosa' %grep% iris$Species, ] %>% head
```
Regular expressions:
```{r}
# Select all entries where species ends with 'a'
X <- iris['a$' %grep% iris$Species, ]
unique(X$Species)
```
## More complicated example
This allows us to produce more complicated chained selections.
I always forget how to write a regular expression for "`one` but not `two`".
In case of the `%grep%` operator, we can use `R`'s logical functions.
Let's create some sample data:
```{r}
# Create some silly example
X <- data.frame(
value = rnorm(100),
fb = paste(
sample(c('foo', 'bar'), 100, replace=T),
sample(c('foo', 'bar'), 100, replace=T)))
X %>% head
```
And apply `%grep%`:
```{r}
# Select foo without bar (same as `foo foo`)
X['foo' %grep% X$fb & !'bar' %grep% X$fb, ] %>% head
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.