Skip to content

Instantly share code, notes, and snippets.

@Plaudenslager
Created May 13, 2016 00:29
Show Gist options
  • Save Plaudenslager/8078d48ce56e287a44b2a7f8d8c6a9f3 to your computer and use it in GitHub Desktop.
Save Plaudenslager/8078d48ce56e287a44b2a7f8d8c6a9f3 to your computer and use it in GitHub Desktop.
Clean multiple versions of text values in data set in R
library(dplyr)
library(tidyr)
# sum_trialname contains product names, including three different versions of one product
# dropping everything after the first space gets me to a consistent product naming
# the extract function, by default, captures the initial alphanumeric data, and drops everything after the first non-alpha character
# by default, teh extract function also drops the original column (sum_trialname, in this case)
# Create a new column with clean, consistent product names
clean_data <- extract(clean_data, sum_trialname, "Product", remove=FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment