Created
May 13, 2016 00:29
-
-
Save Plaudenslager/8078d48ce56e287a44b2a7f8d8c6a9f3 to your computer and use it in GitHub Desktop.
Clean multiple versions of text values in data set in R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(dplyr) | |
library(tidyr) | |
# sum_trialname contains product names, including three different versions of one product | |
# dropping everything after the first space gets me to a consistent product naming | |
# the extract function, by default, captures the initial alphanumeric data, and drops everything after the first non-alpha character | |
# by default, teh extract function also drops the original column (sum_trialname, in this case) | |
# Create a new column with clean, consistent product names | |
clean_data <- extract(clean_data, sum_trialname, "Product", remove=FALSE) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment