Created
August 17, 2018 03:29
-
-
Save duttashi/e826393b3b9400840bbb64e9a1905419 to your computer and use it in GitHub Desktop.
Easy way to separate categorical and continuous variables from a data frame in R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Ensure the data is read as a dataframe and that the categorical variables are read as factors and not characters. | |
# A minimum reprex is given below | |
# load the adult dataset from the UCI ML repo. | |
library(data.table) | |
dt<- fread("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", | |
header = FALSE, sep = ",", stringsAsFactors = TRUE) | |
# coerce data table to data frame | |
dt<- as.data.frame(dt) | |
head(dt) | |
class(dt) | |
# use sapply() | |
dt.cat<-dt[,sapply(dt, is.factor)] | |
dt.cont<-dt[,!sapply(dt, is.factor)] | |
> str(dt.cat) | |
'data.frame': 32561 obs. of 9 variables: | |
$ V2 : Factor w/ 9 levels "?","Federal-gov",..: 8 7 5 5 5 5 5 7 5 5 ... | |
$ V4 : Factor w/ 16 levels "10th","11th",..: 10 10 12 2 10 13 7 12 13 10 ... | |
$ V6 : Factor w/ 7 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ... | |
$ V7 : Factor w/ 15 levels "?","Adm-clerical",..: 2 5 7 7 11 5 9 5 11 5 ... | |
$ V8 : Factor w/ 6 levels "Husband","Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ... | |
$ V9 : Factor w/ 5 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ... | |
$ V10: Factor w/ 2 levels "Female","Male": 2 2 2 2 1 1 1 2 1 2 ... | |
$ V14: Factor w/ 42 levels "?","Cambodia",..: 40 40 40 40 6 40 24 40 40 40 ... | |
$ V15: Factor w/ 2 levels "<=50K",">50K": 1 1 1 1 1 1 1 2 2 2 ... | |
> str(dt.cont) | |
'data.frame': 32561 obs. of 6 variables: | |
$ V1 : int 39 50 38 53 28 37 49 52 31 42 ... | |
$ V3 : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ... | |
$ V5 : int 13 13 9 7 13 14 5 9 14 13 ... | |
$ V11: int 2174 0 0 0 0 0 0 0 14084 5178 ... | |
$ V12: int 0 0 0 0 0 0 0 0 0 0 ... | |
$ V13: int 40 13 40 40 40 40 16 45 50 40 ... |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment