Skip to content

Instantly share code, notes, and snippets.

@gaolei786
Created December 7, 2012 06:40
Show Gist options
  • Save gaolei786/4231254 to your computer and use it in GitHub Desktop.
Save gaolei786/4231254 to your computer and use it in GitHub Desktop.
垃圾邮件的识别
spam <- read.table("https://raw.github.com/gaolei786/gaolei786.github.com/master/data/spam.csv", sep = ",", header = T)#如果你使用R Gui,请运行setInternet2(T),详见http://cos.name/cn/topic/108840?replies=9#post-240472
set.seed(102)
train <- sort(sample(nrow(spam), 3065))
spam.train <- spam[train, ]
spam.test <- spam[-train, ] #注意这种取法
set.seed(200)
rp <- rpart(spam ~ . , spam.train,parms = list(split = "information"), method = "class", cp = 0.001)#种树
plot(rp)
plotcp(rp)
rp1 <- prune(rp, cp = 0.0033)#修剪树
plot(rp1, uniform = T, compress = T, margin = 0.05)
text(rp1, use.n = T)
r2.train.class <- predict(rp1, type = "class")
table(predicted = r2.train.class, actual = spam.train$spam)
(105+96)/(1747+1117)#识别错误率(训练集)
r2.test.class <- predict(rp1, type = "class", newdata = spam.test)
table(predicted= r2.test.class, actual = spam.test$spam)
(73+56)/(863+544)#识别错误率(测试集)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment