DataBaser.Net: 비율에 대한 검정

1 z 통계량
2 빠른 오차 계산
3 modified wald method 로 신뢰구간 구하기
4 1-sample의 비율 검정
5 n개의 집단 비율에 대한 검정
6 발생율(Exact Poisson tests)

[edit]

1 z 통계량 #

90% 신뢰구간에서 z=1.645
95% 신뢰구간에서 z=1.960
99% 신뢰구간에서 z=1.2.576

[edit]

2 빠른 오차 계산 #

A사안에 대해 임의로 선택된 국민들 1000명 중에 300명이 찬성했다. 오차는?

1/sqrt(1000) --> ±3.2%

[edit]

3 modified wald method 로 신뢰구간 구하기 #

z * sqrt(p * (1-p) / (n+z^2))
위 예제에서는.. 1.96 * sqrt(0.3 * (1-0.3) / (1000+1.96^2))

[edit]

4 1-sample의 비율 검정 #

A교실의 학생이 100명이 있다. 이중 오른손 잡이는 86명이다. 한국은 94%가 오른손 잡이다. 한국과 A교실의 학생들의 오른손 잡이 비율은 같나?

> prop.test(86,100,p=0.94)

	1-sample proportions test with continuity correction

data:  86 out of 100, null probability 0.94
X-squared = 9.9734, df = 1, p-value = 0.001588
alternative hypothesis: true p is not equal to 0.94
95 percent confidence interval:
 0.7728837 0.9185961
sample estimates:
   p 
0.86

유의수준 0.05에서 대립가설 채택.

참고: http://www.r-bloggers.com/one-proportion-z-test-in-r/

z.test <- function(x,n,p=NULL,conf.level=0.95,alternative="less") {
  ts.z <- NULL
  cint <- NULL
  p.val <- NULL
  phat <- x/n
  qhat <- 1 - phat
  # If you have p0 from the population or H0, use it.
  # Otherwise, use phat and qhat to find SE.phat:
  if(length(p) > 0) { 
    q <- 1-p
    SE.phat <- sqrt((p*q)/n) 
    ts.z <- (phat - p)/SE.phat
    p.val <- pnorm(ts.z)
    if(alternative=="two.sided") {
      p.val <- p.val * 2
    }
    if(alternative=="greater") {
      p.val <- 1 - p.val
    }
  } else {
    # If all you have is your sample, use phat to find
    # SE.phat, and don't run the hypothesis test:
    SE.phat <- sqrt((phat*qhat)/n)
  }
  cint <- phat + c( 
    -1*((qnorm(((1 - conf.level)/2) + conf.level))*SE.phat),
    ((qnorm(((1 - conf.level)/2) + conf.level))*SE.phat) )
  return(list(estimate=phat,ts.z=ts.z,p.val=p.val,cint=cint))
}

z.test(86,100,p=0.94)

> z.test(86,100,p=0.94)
$estimate
[1] 0.86

$ts.z
[1] -3.368608

$p.val
[1] 0.0003777444

$cint
[1] 0.8134534 0.9065466

[edit]

5 n개의 집단 비율에 대한 검정 #

A도시에서는 300명 중 100명이, B도시에서는 400명 중 170명이 D후보를 지지한다고 조사되었다. A도시와 B도시의 D후부 지지 비율이 같다고 할 수 있는가?

분자 <- c(100, 170)
분모 <- c(300, 400)
prop.test(분자, 분모)

> prop.test(분자, 분모)

	2-sample test for equality of proportions with continuity correction

data:  분자 out of 분모
X-squared = 5.6988, df = 1, p-value = 0.01698
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.16664176 -0.01669158
sample estimates:
   prop 1    prop 2 
0.3333333 0.4250000

결과해석

두 집단에서 어떤 사건에 대한 비율이 같다고 할 수 있는지에 대한 검정.
가설
- 귀무가설: 차이가 없다.
- 대립가설: 차이가 있다. --> 유의수준 0.05에서는 대립가설 지지, 유의수준 0.01에서는 대립가설 기각
95% 신뢰구간: 0.4250000-0.16664176 ~ 0.4250000-0.01669158 = 0.2583582 ~ 0.4083084, 기준은 100/300

엑셀로 하면..

[edit]

6 발생율(Exact Poisson tests) #

카운트 데이터에 대해..

> poisson.test(분자, 분모)

	Comparison of Poisson rates

data:  분자 time base: 분모
count1 = 100, expected count1 = 115.71, p-value = 0.05656
alternative hypothesis: true rate ratio is not equal to 1
95 percent confidence interval:
 0.6064139 1.0099403
sample estimates:
rate ratio 
 0.7843137

1표본

> poisson.test(83, 100)

	Exact Poisson test

data:  83 time base: 100
number of events = 83, time base = 100, p-value = 0.09854
alternative hypothesis: true event rate is not equal to 1
95 percent confidence interval:
 0.6610904 1.0289099
sample estimates:
event rate 
      0.83

귀무가설: 모집단 발생률(λ)이 귀무 가설에서의 발생률과 같다.
대립가설: 모집단 발생률(λ)이 귀무 가설에서의 발생률과 다르다.
유의수준 0.05에서 귀무가설 지지

Contents

1 z 통계량 #

2 빠른 오차 계산 #

3 modified wald method 로 신뢰구간 구하기 #

4 1-sample의 비율 검정 #

5 n개의 집단 비율에 대한 검정 #

6 발생율(Exact Poisson tests) #