Contents

1 一危
2 ろろ
3 覲 譴 Gini impurity
4 RRF
5
6 谿瑚襭


(forest) 螻, 賀 覓(tree)れ . 蠍一 覓企 蟆一碁Μ. input 一危磯 random願, 蛾 random企. 賀 覓企れ蟆 random input 伎 螳 覓企れ 覬企企 蟆郁骸襯 voting(れ蟆一 豺)伎 覿襯. 一危一 螻殊朱 ろ螻, 襷 覲襯 伎企 覲 蟇 ろ 螳 ク企. unbalanced class 覈讌 襷. -- R 伎 觜一危 覿, 蟾蟆渚 谿瑚

1 一危 #

[http]EXCEL 譟一覦覯 覦 糾覿(http://www.kyobobook.co.kr/product/detailViewKor.laf?ejkGb=KOR&mallGb=KOR&barcode=9788983257000&orderClick=LAG&Kc=SETLBkserp11_15)襯 伎.
cname <- c("ID", "蟲襷る", "磯","碁一", "碁", "覦覓碁", "蟇一朱")
x = read.table("c:\\data\\disc.txt", col.names = cname)
head(x)
disc.txt

> head(x)
  ID  蟲襷る 磯  碁一  碁  覦覓碁  蟇一朱
1  1          A   48       9000          4        5        6
2  2          A   58       8000          6        4       20
3  3          A   52       7000          6        4       12
4  4          A   63       7000          6        4       15
5  5          A   59       8000          4        6        6
6  6          A   38      11000          5        4       10
> 

2 ろろ #

tree <- randomForest(蟲襷る ~ 磯 + 碁一 + 碁 + 覦覓碁 + 蟇一朱, data=x)
print(tree) # view results 
importance(tree)

蟆郁骸
> print(tree) # view results 

Call:
 randomForest(formula = 蟲襷る ~ 磯 + 碁一 + 碁 + 覦覓碁 + 蟇一朱, data = x) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 5%
Confusion matrix:
   A B class.error
A 10 0         0.0
B  1 9         0.1
> importance(tree)
           MeanDecreaseGini
磯               2.624620
碁一         1.815804
碁         1.263035
覦覓碁           1.196015
蟇一朱           2.576659
> 
蟲襷る襯 蟆一 覲 譴 磯 > 蟇一朱 > 碁一 > 碁 > 覦覓碁 企.

rf <- randomForest(factor(t3)~diff_cnt+diff_time, data=x6, type="classification", importance=TRUE,na.action=na.omit)
pred <- predict(rf, newdata=test)
table(pred, test$t3)

data(iris)
set.seed(111)
ind <- sample(2, nrow(iris), replace = TRUE, prob=c(0.8, 0.2))
iris.rf <- randomForest(Species ~ ., data=iris[ind == 1,])
iris.pred <- predict(iris.rf, iris[ind == 2,])
table(observed = iris[ind==2, "Species"], predicted = iris.pred)

襦..
install.packages("rpart")
library("rpart")
cf <- cforest(Species ~ ., data = iris) 
pt <- party:::prettytree(cf@ensemble[[1]], names(cf@data@get("input"))) 
pt 
nt <- new("BinaryTree") 
nt@tree <- pt 
nt@data <- cf@data 
nt@responses <- cf@responses 
nt 
plot(nt) 

install.packages("tree")
library(tree)
tr <- tree(Species ~ ., data=iris)
tr

3 覲 譴 Gini impurity #

Gini impurity
讌 讌(Gini Index) 覿(impurity)襯 豸′ 讌企. 螳豌願 覈覲 i覯讌 覯譯朱覿 豢豢螻, 蠏 螳豌企ゼ 覈覲 j覯讌 覯譯殊 り る襯(misclassification) 襯 P(i)P(j)螳 . 蠍一 P(i) 螳 襷 螳豌願 覈覲 I覯讌 覯譯殊 襯企. 企 る襯 襯 覈

rf100.png

襯 詞 螻, 企 螳 覿襯蠏豺 る襯 襯 豢豺 螳 . 蠍一 c 覈覲 覯譯殊 襯 襷.
--http://kostat.go.kr/attach/journal/4-1-3.PDF

覲 譴
imp <- data.frame(importance(model))
imp[order(imp$MeanDecreaseGini, decreasing=T),]
varImpPlot(model)
蟆郁骸 蟯 譴 讌 覿 蟯 譴 2螳讌襦 plotting .

4 RRF #

Regularized Random Forest
install.packages("RRF")
library("RRF")
model <- RRF(factor(is_out) ~ ., data=training, type="classification", importance=TRUE)
pred <- predict(model, newdata=test2)
confusionMatrix(pred, test2$is_out)


5 #

6 谿瑚襭 #