Skip to content

Instantly share code, notes, and snippets.

@jthomasmock
Created April 12, 2018 20:49
Show Gist options
  • Save jthomasmock/2b86d49070e67914e10d798e19d67707 to your computer and use it in GitHub Desktop.
Save jthomasmock/2b86d49070e67914e10d798e19d67707 to your computer and use it in GitHub Desktop.
code to find group data
---
title: 'Find the #Group'
output:
html_document:
df_print: paged
---
```{r}
library(tidyverse)
```
## Create the dataframe
Here is the dataframe with weird group inserts!
```{r}
# create a dataframe for reprex
df <- data.frame(col1 = c("#GROUP 1", 1, 4, "#GROUP 2", 7, 10, "#GROUP 3", 13, 16),
col2 = c(NA, 2, 5, NA, 8, 11, NA, 14, 17),
col3 = c(NA, 3, 6, NA, 9, 12, NA, 15, 18))
df
```
## find where the groups are
We `grep` to find where the rows are and then combine into a dataframe that tells us what group corresponds to top and bottom row for each of the groups. In this case, group1 is rows 2 and 3 (because #GROUP1 is at row 1, and #GROUP4 is at row 4).
```{r}
# get a string that identifies row location of the `#GROUP`
group_range <- grep("#GROUP", df$col1)
# what rows have group in them?
group_range
```
## Combine into a dataframe
This lets us see where the matches of group_number are with the range of rows.
```{r}
# string with group numbers
group_number <- 1:length(group_range)
# combine the group_number with group_range
group_df <- data.frame(group_number, group_range)
group_df
```
## Function for finding top and bottow rows for each of the groups
This spits out what the rows of actual data are for each of the groups.
```{r}
find_range_fun <- function(top_group, bottom_group, df_name) {
print(paste0("The top number is ", df_name[top_group, 2] + 1))
print(paste0("The bottom number is ", df_name[bottom_group, 2] - 1))
}
```
### Test the function
test the function for group 1
```{r}
# test function for group 1 (between just below #GROUP 1 and just above #GROUP 2)
find_range_fun(top_group = 1,
bottom_group = 2,
group_df)
```
and for group 2.
```{r}
# test function for group 2 (between just below #GROUP 2 and just above #GROUP 3)
find_range_fun(2, 3, group_df)
```
## Function for subsetting original dataframe
This dataframe grabs each of the subsets.
```{r}
# create a function that combines between subsets of rows
find_fun <- function(row_top, row_bottom, grp_num, df_name, rep_length){
cbind(df_name[row_top:row_bottom, ], group_var = rep(grp_num, rep_length))
}
```
### Testing the function
This works for group 1 and group 2.
```{r}
#testing the function
group_1_subset <- find_fun(row_top = 2,
row_bottom = 3,
grp_num = 1,
df_name = df,
rep_length = 2)
group_1_subset
#testing the function
group_2_subset <- find_fun(row_top = 5,
row_bottom = 6,
grp_num = 2,
df_name = df,
rep_length = 2)
group_2_subset
```
## Look at raw DF
```{r}
# compare those results to raw df
df
```
## Compared to the swapped data.
```{r}
rbind(group_1_subset, group_2_subset)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment