_대문 | 방명록 | 최근글 | 홈피소개 | 주인놈 |
FrontPage › 평균의차이에대한검정
|
|
[edit]
1 SE, Standard Error #몽둥의 길이를 5회 측정했다.
x <- c(76.2, 76.3, 76.1, 76,3, 76.4) se <- sd(x)/sqrt(length(x)) #0.6009252 se #1.643168 학생 2,000명의 수학 모의고사 성적이 있다.
set.seed(1000) x <- rnorm(2000, mean = 70, sd = 10) summary(x)결과 > summary(x) Min. 1st Qu. Median Mean 3rd Qu. Max. 36.38 63.33 69.89 69.94 76.54 99.19 표준오차는..
mu <- c() for(i in 1:100){ mu <- c(mean(sample(x, 5)), mu) } se <- sd(mu) / sqrt(5) mean(mu) #69.82971 se #1.935552 [edit]
2 평균을 비교하는 검정 방법 #
[edit]
3 t 검정 개요 #
[edit]
4 One Sample t-test #2학년 1반의 2011년 하루 평균 게임시간 2.1시간 이었다. 2012년에 10명을 무작위로 선발하여 게임 시간을 조사하였다. 2011년과 2012년이 다른가?
x <- c(3.3, 2.8, 3.0, 2.7, 2.7, 2.0, 1.9, 3.4, 1.4, 1.4) t.test(x, mu=2.1) 정규성 검정
> shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.9085, p-value = 0.2711
> t.test(x, mu=2.1) One Sample t-test data: x t = 1.5454, df = 9, p-value = 0.1567 alternative hypothesis: true mean is not equal to 2.1 95 percent confidence interval: 1.933026 2.986974 sample estimates: mean of x 2.46
> t.test(x) One Sample t-test data: x t = 10.6, df = 9, p-value = 2.269e-06 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 1.93 2.99 sample estimates: mean of x 2.46 [edit]
5 Two Sample t-test #
x1 <- c(15,10,13,7,9,8,21,9,14,8) x2 <- c(15,14,12,8,14,7,16,10,15,12) 정규성 검정 --> x1, x2가 유의수준 0.05에서 정규분포임.
> shapiro.test(x1) Shapiro-Wilk normality test data: x1 W = 0.8666, p-value = 0.09131 > shapiro.test(x2) Shapiro-Wilk normality test data: x2 W = 0.9125, p-value = 0.2986 분산이 동일한가?
> var.test(x1, x2) F test to compare two variances data: x1 and x2 F = 1.9791, num df = 9, denom df = 9, p-value = 0.3237 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.491579 7.967821 sample estimates: ratio of variances 1.979094
> t.test(x1, x2, var.equal=T) Two Sample t-test data: x1 and x2 t = -0.5331, df = 18, p-value = 0.6005 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.446765 2.646765 sample estimates: mean of x mean of y 11.4 12.3
> t.test(x1, x2, alternative="less", var.equal=T) Two Sample t-test data: x1 and x2 t = -0.5331, df = 18, p-value = 0.3002 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf 2.027436 sample estimates: mean of x mean of y 11.4 12.3 [edit]
6 Paired t-test #
> t.test(x1, x2, var.equal=T, paired=T) Paired t-test data: x1 and x2 t = -0.9612, df = 9, p-value = 0.3616 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -3.018069 1.218069 sample estimates: mean of the differences -0.9
[edit]
7 분산분석 #
[edit]
8 일원분산분석 ##http://code.google.com/p/sonya/source/browse/trunk/r-project/sample/PlantGrowth.csv plantGrowth = read.csv("c:\\data\\PlantGrowth.csv") head(plantGrowth) boxplot(weight ~ group, data=plantGrowth) out <- lm(weight ~ group, data=plantGrowth) summary(out) anova(out) > summary(out) Call: lm(formula = weight ~ group, data = plantGrowth) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4180 -0.0060 0.2627 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.0320 0.1971 25.527 <2e-16 *** grouptrt1 -0.3710 0.2788 -1.331 0.1944 grouptrt2 0.4940 0.2788 1.772 0.0877 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6234 on 27 degrees of freedom Multiple R-squared: 0.2641, Adjusted R-squared: 0.2096 F-statistic: 4.846 on 2 and 27 DF, p-value: 0.01591 > anova(out) Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) group 2 3.7663 1.8832 4.8461 0.01591 * Residuals 27 10.4921 0.3886 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
par(mfrow=c(2,2)) plot(out) ![]() 정규성 -> p-value = 0.4379이므로 정규분포다.
> shapiro.test(resid(out)) Shapiro-Wilk normality test data: resid(out) W = 0.9661, p-value = 0.4379 등분산성 -> p-value = 0.1714로 귀무가설 지지. 즉, 등분산
#library("lmtest") > bptest(out) studentized Breusch-Pagan test data: out BP = 3.5273, df = 2, p-value = 0.1714 독립성
> dwtest(out) #library("lmtest") Durbin-Watson test data: out DW = 2.704, p-value = 0.9502 alternative hypothesis: true autocorrelation is greater than 0
방법1: Dunnett -> 평균의 차이가 없는 조합을 보여준다. (control 대비 비교법)
install.packages("multcomp") library("multcomp") out <- lm(weight ~ group, data=PlantGrowth) dunnett <- glht(out, linfct=mcp(group="Dunnett")) #여기서 group은 plantGrowth$group 이다. summary(dunnett) plot(dunnett) ![]() 95% 신뢰구간이 0을 포함 > summary(dunnett) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Dunnett Contrasts Fit: lm(formula = weight ~ group, data = PlantGrowth) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) trt1 - ctrl == 0 -0.3710 0.2788 -1.331 0.323 trt2 - ctrl == 0 0.4940 0.2788 1.772 0.153 (Adjusted p values reported -- single-step method)
install.packages("multcomp") library("multcomp") out <- lm(weight ~ group, data=PlantGrowth) tukey <- glht(out, linfct=mcp(group="Tukey")) #여기서 group은 PlantGrowth$group 이다. summary(tukey) plot(tukey) ![]() > summary(tukey) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lm(formula = weight ~ group, data = plantGrowth) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) trt1 - ctrl == 0 -0.3710 0.2788 -1.331 0.391 trt2 - ctrl == 0 0.4940 0.2788 1.772 0.198 trt2 - trt1 == 0 0.8650 0.2788 3.103 0.012 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method)
[edit]
9 이원분산분석 ##http://code.google.com/p/sonya/source/browse/trunk/r-project/sample/warpbreaks.csv?r=653 warpbreaks = read.csv("c:\\data\\warpbreaks.csv") 이산형 변수인 wool과 tension의 순서 확인
> levels(warpbreaks$wool) [1] "A" "B" > levels(warpbreaks$tension) [1] "L" "M" "H"
> warpbreaks$tension = factor(warpbreaks$tension, level = c("L", "M", "H")) > levels(warpbreaks$tension) [1] "L" "M" "H" 분산분석
out <- lm(breaks ~ wool*tension, data = warpbreaks) summary(out) > summary(out) Call: lm(formula = breaks ~ wool * tension, data = warpbreaks) Residuals: Min 1Q Median 3Q Max -19.5556 -6.8889 -0.6667 7.1944 25.4444 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 44.556 3.647 12.218 2.43e-16 *** woolB -16.333 5.157 -3.167 0.002677 ** tensionM -20.556 5.157 -3.986 0.000228 *** tensionH -20.000 5.157 -3.878 0.000320 *** woolB:tensionM 21.111 7.294 2.895 0.005698 ** woolB:tensionH 10.556 7.294 1.447 0.154327 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 10.94 on 48 degrees of freedom Multiple R-squared: 0.3778, Adjusted R-squared: 0.3129 F-statistic: 5.828 on 5 and 48 DF, p-value: 0.0002772
> shapiro.test(resid(out)) Shapiro-Wilk normality test data: resid(out) W = 0.9869, p-value = 0.8162 등분산성 -> p-value = 0.0006307로 귀무가설 기각. 즉, 등분산이 아님. 종속변수 breaks에 log()나 sqrt()하자.
#library("lmtest") > bptest(out) studentized Breusch-Pagan test data: out BP = 21.5744, df = 5, p-value = 0.0006307 독립성
> dwtest(out) Durbin-Watson test data: out DW = 2.2376, p-value = 0.575 alternative hypothesis: true autocorrelation is greater than 0
> out <- lm(log(breaks) ~ wool*tension, data = warpbreaks) > shapiro.test(resid(out)) Shapiro-Wilk normality test data: resid(out) W = 0.9729, p-value = 0.2583 > bptest(out) studentized Breusch-Pagan test data: out BP = 4.8045, df = 5, p-value = 0.4402 > dwtest(out) Durbin-Watson test data: out DW = 2.06, p-value = 0.3167 alternative hypothesis: true autocorrelation is greater than 0
library("multcomp") out <- lm(log(breaks) ~ wool + tension, data=warpbreaks) tukey1 <- glht(out, linfct=mcp(wool="Tukey")) tukey2 <- glht(out, linfct=mcp(tension="Tukey")) summary(tukey1) summary(tukey2) > summary(tukey1) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lm(formula = log(breaks) ~ wool + tension, data = warpbreaks) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) B - A == 0 -0.1522 0.1063 -1.431 0.159 (Adjusted p values reported -- single-step method) > summary(tukey2) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lm(formula = log(breaks) ~ wool + tension, data = warpbreaks) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) M - L == 0 -0.2871 0.1302 -2.205 0.08018 . H - L == 0 -0.4893 0.1302 -3.758 0.00133 ** H - M == 0 -0.2022 0.1302 -1.553 0.27550 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method) [edit]
10 공분산분석(ANCOVA; Analysis of Covariance) #
|
일이 어려우니까 우리가 감히 손을 못 대는 것이 아니다. 우리가 과감히 손을 대지 않으니까 일이 어려워지는 것이다. (세네카) |