_覓 | 覦覈襦 | 豕蠏手 | 殊螳 | 譯殊碁 |
FrontPage › 襦讌ろ燕蠏覿
|
|
[edit]
2 R #蠍磯蓋朱 豌 .
cname <- c("x1", "x2", "x3","x4", "x5", "x6", "y") german = read.table("c:\\data\\german1.txt", col.names = cname) options(contrasts = c("contr.treatment", "contr.poly")) german.object <- glm(y~.,family=binomial, data=german) summary(german.object) p <- predict(german.object, german, type="response") german <- cbind(german, p) head(german) p y=1 襯企. 蟲 y=1 覿螻螳企.
> head(german) x1 x2 x3 x4 x5 x6 y p 1 A11 6 A34 4 67 1 0 0.18170622 2 A12 48 A32 2 22 1 1 0.63312134 3 A14 12 A34 3 49 2 0 0.04812842 4 A11 42 A32 4 45 2 0 0.64393801 5 A11 24 A33 4 53 2 1 0.47102819 6 A14 36 A32 4 35 2 0 0.20127115 [edit]
4 覲覯 #讌, 覲蟇,
cname <- c("x1", "x2", "x3","x4", "x5", "x6", "y") german = read.table("c:\\data\\german1.txt", col.names = cname) #襷 豐蠍磯 g.object <- glm(y~1, family=binomial, data=german) #讌覯 s.list <- list(upper = update(g.object, ~. +x1+x2+x3+x4+x5+x6)) step(g.object, scope=s.list, direction="forward") #覲蟇磯 #s.list <- list(lower = update(g.object, ~. -x1-x2-x3-x4-x5-x6)) #step(g.object, scope=s.list, direction="backward") # #s.list <- list(upper = update(g.object, ~. +x1+x2+x3+x4+x5+x6), lower = update(g.object, ~. -x1-x2-x3-x4-x5-x6)) #step(g.object, scope=s.list, direction="both") 蟆郁骸
> step(g.object, scope=s.list, direction="forward") Start: AIC=1223.73 y ~ 1 Df Deviance AIC + x1 3 1090.4 1098.4 + x3 4 1161.3 1171.3 + x2 1 1177.1 1181.1 + x5 1 1213.1 1217.1 <none> 1221.7 1223.7 + x6 1 1221.7 1225.7 + x4 1 1221.7 1225.7 Step: AIC=1098.39 y ~ x1 Df Deviance AIC + x2 1 1051.9 1061.9 + x3 4 1053.2 1069.2 + x5 1 1085.0 1095.0 <none> 1090.4 1098.4 + x4 1 1090.2 1100.2 + x6 1 1090.3 1100.3 Step: AIC=1061.9 y ~ x1 + x2 Df Deviance AIC + x3 4 1022.6 1040.6 + x5 1 1046.7 1058.7 <none> 1051.9 1061.9 + x4 1 1051.5 1063.5 + x6 1 1051.9 1063.9 Step: AIC=1040.58 y ~ x1 + x2 + x3 Df Deviance AIC + x5 1 1019.3 1039.3 <none> 1022.6 1040.6 + x6 1 1022.5 1042.5 + x4 1 1022.5 1042.5 Step: AIC=1039.27 y ~ x1 + x2 + x3 + x5 Df Deviance AIC <none> 1019.3 1039.3 + x4 1 1019.2 1041.2 + x6 1 1019.3 1041.3 Call: glm(formula = y ~ x1 + x2 + x3 + x5, family = binomial, data = german) Coefficients: (Intercept) x1A12 x1A13 x1A14 x2 0.66510 -0.53186 -1.07608 -1.89880 0.03434 x3A31 x3A32 x3A33 x3A34 x5 -0.09915 -0.94696 -0.93433 -1.52607 -0.01277 Degrees of Freedom: 999 Total (i.e. Null); 990 Residual Null Deviance: 1222 Residual Deviance: 1019 AIC: 1039 >豕譬朱 glm(formula = y ~ x1 + x2 + x3 + x5, family = binomial, data = german) 螳 . AIC螳 螳 覈語 . 谿瑚: AIC (豢豌)
AIC 糾 豕 覈 蟆一 蟆曙 覈 るルリ骸 覈 曙煙 螻ろ伎 . 覈 るルレ 磯(likelihood)襦 豸′ 朱, 覈 曙煙 覲旧″ 覈 覲企る 螳蟆壱 覈 碁伎 る 蟆朱 豢 螻 瑚 襷 蟆 覯豺 螳朱 螻る . 蠏碁 覈 るリ骸 曙煙 豢(trade-off)蠍一朱, 覲旧″ 覈 るレ 郁 覦蟆 . 磯殊 蠍一 蟷 螻ろ 蠏譴 蟆 . 蠏瑚 覦襦 覲願鍵譴 蠏朱朱 Akailke 螳覦 Akaike Information Criterion(AIC)企. AIC = -2 log L +2n 蠍一 'n': 朱誤 襯 企, 'log L' 磯(log-likelihood). Akailke 覃 AIC糾 螳 豕螳 覈 豈 D(Deviance) 朱 覈 覈 蟆讀 給企.
cname <- c("x1", "x2", "x3","x4", "x5", "x6", "y") german = read.table("c:\\data\\german1.txt", col.names = cname) options(contrasts = c("contr.treatment", "contr.poly")) german.object <- glm(formula = y ~ x1 + x2 + x3 + x5, family = binomial, data = german) summary(german.object) 蟆郁骸
Call: glm(formula = y ~ x1 + x2 + x3 + x5, family = binomial, data = german) Deviance Residuals: Min 1Q Median 3Q Max -1.7997 -0.8007 -0.4747 0.9283 2.4241 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.665099 0.476311 1.396 0.162607 x1A12 -0.531864 0.185402 -2.869 0.004121 ** x1A13 -1.076082 0.336956 -3.194 0.001405 ** x1A14 -1.898800 0.206314 -9.203 < 2e-16 *** x2 0.034342 0.006329 5.426 5.76e-08 *** x3A31 -0.099153 0.474942 -0.209 0.834629 x3A32 -0.946961 0.370644 -2.555 0.010622 * x3A33 -0.934327 0.433303 -2.156 0.031061 * x3A34 -1.526069 0.393754 -3.876 0.000106 *** x5 -0.012765 0.007091 -1.800 0.071821 . --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1221.7 on 999 degrees of freedom Residual deviance: 1019.3 on 990 degrees of freedom AIC: 1039.3 Number of Fisher Scoring iterations: 4 覲 蟆曙 覿襯 覲願 覲企ゼ . (Y れ蟇磯)
# r: 蟆郁骸set # r 襭襷 覲伎 plotting # column 語 T/F training <- sqlQuery(conn, "select cnt x1 , logkind x4 , genre x5 , diff_cnt x6 , diff_time x8 , gamecd x9 , reg_hh x10 , case when t <= 2 then 1 else 0 end y from temp.dbo.out_data01 ") #install.packages("reshape") #install.packages("ggplot2") library("reshape") library("ggplot2") number_cols <- sapply(training, is.numeric) r.lng <- melt(training[,number_cols], id="y") # id: 譬覲覈. p <- ggplot(aes(x=value, group=y, colour=factor(y)), data=r.lng) p + geom_density() + facet_wrap(~variable,scale="free") 蟆郁骸 れ 蠏碁手骸 螳.
蠍一 蠏碁殊 覲願 x6, x8 .
[edit]
5 odds logit #pred <- predict(german.object, german, type="response") logodds <- predict(german.object, german) german <- cbind(german, pred) #襯 german <- cbind(german, logodds) #れ, odds head(german) 蟆郁骸
> head(german) x1 x2 x3 x4 x5 x6 y pred logodds 1 A11 6 A34 4 67 1 0 0.18091034 -1.5101920 2 A12 48 A32 2 22 1 1 0.63503244 0.5538676 3 A14 12 A34 3 49 2 0 0.04865313 -2.9731627 4 A11 42 A32 4 45 2 0 0.64246423 0.5860757 5 A11 24 A33 4 53 2 1 0.46964395 -0.1215737 6 A14 36 A32 4 35 2 0 0.19922819 -1.3911252 2覯讌 Row
願, 蠏 れ螻 螳朱襦 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.665099 0.476311 1.396 0.162607 x1A12 -0.531864 0.185402 -2.869 0.004121 ** x1A13 -1.076082 0.336956 -3.194 0.001405 ** x1A14 -1.898800 0.206314 -9.203 < 2e-16 *** x2 0.034342 0.006329 5.426 5.76e-08 *** x3A31 -0.099153 0.474942 -0.209 0.834629 x3A32 -0.946961 0.370644 -2.555 0.010622 * x3A33 -0.934327 0.433303 -2.156 0.031061 * x3A34 -1.526069 0.393754 -3.876 0.000106 *** x5 -0.012765 0.007091 -1.800 0.071821 . れ螻 螳 螻磯.
0.665099 -0.531864 + 48 * 0.034342 - 0.946961 -0.012765 * 22 = 0.5538676
襯..x <- 0.665099 -0.531864 + 48 * 0.034342 - 0.946961 -0.012765 * 22 1 / (1 + exp(-x)) #蟆郁骸螳 0.6350307 exp(x) / (1 + exp(x)) #蟆郁骸螳 0.6350307, 願 襦讌ろ 覈螻 螳 螻壱覃 . [edit]
6 蟆郁骸襯 export #蟆郁骸襯 excel襦 export 企慨.
install.packages("xlsx") library("xlsx") write.xlsx(german, "c:\\data\\german_result.xlsx") or
企暑慨襦 覲旧
write.table(german, 'clipboard', sep='\t') [edit]
7 odds ratio #> exp(coef(german.object)) (Intercept) x1A12 x1A13 x1A14 x2 x3A31 x3A32 x3A33 1.9446836 0.5875087 0.3409286 0.1497482 1.0349388 0.9056041 0.3879180 0.3928503 x3A34 x5 0.2173886 0.9873158 >伎: x1A12 y=1 odds襯 0.5875087覦 碁. 1/0.5875087覦 螳り 覦蠖 襷 . [edit]
8 覿襯(confusion matrix) ##T1 <- length(german[german$y==1 & german$pred > 0.5,]$x1) #T0 <- length(german[german$y==0 & german$pred > 0.5,]$x1) #F1 <- length(german[german$y==1 & german$pred <= 0.5,]$x1) #F0 <- length(german[german$y==0 & german$pred <= 0.5,]$x1) # install.packages("SDMTools") library("SDMTools") confusion.matrix(german$y,german$pred,threshold=0.5) 蟆郁骸
> confusion.matrix(german$y,german$pred,threshold=0.5) obs pred 0 1 0 629 177 1 71 123 attr(,"class") [1] "confusion.matrix"
れ螻 螳 R襦 螻壱企 . library("SDMTools") mat <- confusion.matrix(german$y,german$pred,threshold=0.5) mat omission(mat) sensitivity(mat) specificity(mat) prop.correct(mat) 蟆郁骸
> mat obs pred 0 1 0 629 177 1 71 123 attr(,"class") [1] "confusion.matrix" > omission(mat) [1] 0.59 > sensitivity(mat) [1] 0.41 > specificity(mat) [1] 0.8985714 > prop.correct(mat) [1] 0.752 > れ螻 螳 覃 .
> accuracy(german$y,german$pred,threshold=0.5) threshold AUC omission.rate sensitivity specificity prop.correct Kappa 1 0.5 0.6542857 0.59 0.41 0.8985714 0.752 0.3432203 > 螳 る
for (i in seq(0,1,0.01)) { rs <- c(rs, accuracy(x2$t0,x2$pred,threshold=i)$prop.correct) } max_val = max(rs) for (i in seq(0,1,0.01)) { tmp <- accuracy(x2$t0,x2$pred,threshold=i)$prop.correct if ( max_val == tmp ) { print (paste("threshold = ", i)) } } [edit]
9 ROC(Receiver-Operating Characteristic curve) 螻′ #install.packages("ROCR") library("ROCR") cname <- c("x1", "x2", "x3","x4", "x5", "x6", "y") german = read.table("c:\\data\\german1.txt", col.names = cname) options(contrasts = c("contr.treatment", "contr.poly")) german.object <- glm(formula = y ~ x1 + x2 + x3 + x5, family = binomial, data = german) pred <- predict(german.object, german, type="response") logodds <- predict(german.object, german) german <- cbind(german, pred) german <- cbind(german, logodds) roc = prediction(german$pred, german$y) perf = performance(roc, "tpr", "fpr") #"tpr" hit rate, "fpr" 蟆暑慨(false alarm rate) plot(perf) #plot(perf, print.cutoff.at=seq(0,1,by=0.1)) [edit]
11 蟆 #install.packages("logistf") library("logistf") lr2 = logistf(y ~ x1 + x2 + x3 + x5, data = german) summary(lr2) 蟆郁骸
> summary(lr2) logistf(formula = y ~ x1 + x2 + x3 + x5, data = german) Model fitted by Penalized ML Confidence intervals and p-values by Profile Likelihood coef se(coef) lower 0.95 upper 0.95 Chisq (Intercept) 0.64472876 0.474759083 -0.27043182 1.581359116 1.89900357 x1A12 -0.52660853 0.185120846 -0.88988344 -0.166823398 8.25198090 x1A13 -1.04548754 0.334474867 -1.72559520 -0.417128337 11.03319913 x1A14 -1.87794257 0.205266409 -2.28709164 -1.483111213 Inf x2 0.03387929 0.006308928 0.02165981 0.046313848 29.94733537 x3A31 -0.09194231 0.473477203 -1.01911361 0.824239056 0.03855106 x3A32 -0.92696865 0.369386547 -1.66277580 -0.221680910 6.67102543 x3A33 -0.90779881 0.431809628 -1.76384501 -0.081117723 4.64170188 x3A34 -1.49870282 0.392307541 -2.27839153 -0.748208241 15.50420478 x5 -0.01252255 0.007063063 -0.02651766 0.001115517 3.23324577 p (Intercept) 1.681899e-01 x1A12 4.070755e-03 x1A13 8.949457e-04 x1A14 0.000000e+00 x2 4.439416e-08 x3A31 8.443407e-01 x3A32 9.799279e-03 x3A33 3.120404e-02 x3A34 8.232193e-05 x5 7.215754e-02 Likelihood ratio test=200.5529 on 9 df, p=0, n=1000 Wald test = 152.8214 on 9 df, p = 0 Covariance-Matrix: [,1] [,2] [,3] [,4] [,5] [1,] 0.225396187 -2.116841e-02 -0.0178520944 -1.475598e-02 -1.030299e-03 [2,] -0.021168410 3.426973e-02 0.0158947775 1.663656e-02 -4.993463e-05 [3,] -0.017852094 1.589478e-02 0.1118734364 1.582290e-02 1.008258e-04 [4,] -0.014755985 1.663656e-02 0.0158228986 4.213430e-02 -6.926150e-05 [5,] -0.001030299 -4.993463e-05 0.0001008258 -6.926150e-05 3.980257e-05 [6,] -0.129133562 4.653294e-03 0.0017777035 2.271459e-03 1.822720e-04 [7,] -0.134642541 3.932850e-03 0.0010335670 1.458292e-04 2.134383e-04 [8,] -0.118628622 -2.470301e-03 0.0005935660 -5.484439e-03 -2.417103e-05 [9,] -0.126861265 4.418244e-03 0.0018526164 -2.134237e-03 2.052452e-04 [10,] -0.001743630 7.292805e-05 -0.0000437410 1.938701e-05 -1.300694e-07 [,6] [,7] [,8] [,9] [,10] [1,] -1.291336e-01 -1.346425e-01 -1.186286e-01 -0.1268612649 -1.743630e-03 [2,] 4.653294e-03 3.932850e-03 -2.470301e-03 0.0044182436 7.292805e-05 [3,] 1.777704e-03 1.033567e-03 5.935660e-04 0.0018526164 -4.374100e-05 [4,] 2.271459e-03 1.458292e-04 -5.484439e-03 -0.0021342366 1.938701e-05 [5,] 1.822720e-04 2.134383e-04 -2.417103e-05 0.0002052452 -1.300694e-07 [6,] 2.241807e-01 1.258971e-01 1.240776e-01 0.1263132224 -8.388392e-05 [7,] 1.258971e-01 1.364464e-01 1.240557e-01 0.1260949748 7.408724e-05 [8,] 1.240776e-01 1.240557e-01 1.864596e-01 0.1247551153 -9.031551e-05 [9,] 1.263132e-01 1.260950e-01 1.247551e-01 0.1539052064 -1.431441e-04 [10,] -8.388392e-05 7.408724e-05 -9.031551e-05 -0.0001431441 4.988686e-05 企郁碓 .
install.packages("survey") library("survey") regTermTest(german.object, ~x1 + x2 + x3 + x5) 蟆郁骸
> regTermTest(german.object, ~x1 + x2 + x3 + x5) Wald test for x1 x2 x3 x5 in glm(formula = y ~ x1 + x2 + x3 + x5, family = binomial, data = german) F = 17.2046 on 9 and 990 df: p= < 2.22e-16p-value螳 譟磯 朱蟾 蠏覓願 讌讌 覈詩. 讀, 谿願 . [edit]
12 譬覲螳 3螳 伎 蟆曙 #library(nnet) mode <- multinom(Species~., data=training) #head (fitted(out)) #蟆郁骸 襯 pred <- predict(model, newdata=test, type="class") library (caret) confusionMatrix(pred, test$Species)
鏤
|
覲企 蟆讀 蟯 企. |