Skip to content

Instantly share code, notes, and snippets.

Created September 13, 2015 03:48
Show Gist options
  • Save dewittpe/bcc78fd45b3d4887deb5 to your computer and use it in GitHub Desktop.
Save dewittpe/bcc78fd45b3d4887deb5 to your computer and use it in GitHub Desktop.
Regular expressions for selecting columns to read into R via `readr::read_delim`
# Example using regular expressions and setting col_types for use with
# readr::read_delim
# function select_cols
# args:
# clnms a character vector of column names
# rexprs a character vector of regular expressions to search clnms for. These
# rexprs select the columns form the .csv
# types a character vector of "l", "i", "d", "c" for logical, integer,
# double, and character. see documentation for readr for more detail
# return:
# a charcter string to pass to the col_types argument of readr::read_csv()
select_cols <- function(clnms, rexprs, types) {
if (length(rexprs) != length(types))
stop("length(rexprs) != length(types))")
cls <- rep("_", length(clnms))
for(i in seq_along(rexprs)) {
cls[grep(rexprs[i], clnms)] <- types[i]
paste(cls, collapse = "")
# Example data
input_data <-
# Read in the names of the data set
input_data_clnms <-
names(readr::read_csv(input_data, n_max = 1, col_names = TRUE)[1, ])
# use the select_cols function to read in only the id, age, group, procedures 1 through 5
# and the day of service for each procedure.
pick_these_columns <-
select_cols(clnms = input_data_clnms,
rexprs = c("sid|procdos[1-5]$", "age|proc[1-5]$", "group"),
types = c("i", "d", "c"))
# now read in the data
readr::read_csv(input_data, col_types = pick_these_columns)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment