Contents

1 蠏覿 蠍一
2 瑚骸 蟯螻 焔 譟郁唄
3 覲覲(Box-Cox覲)
4 蠏覿
5 譴蠏覿
6
7 覩碁
8 蟲語
9 れ螻旧(multicollinearity)
10 蠏讌
11 蠏 觜蟲
12 る 蠏覿
13 蟆郁骸 伎
14 譴 蠏螻(beta)
15 predict
16 襷一る 貊貊 谿伎 伎豺 蟇 蠏覿蠍
17 coef 襤郁規螳
18 confidence, prediction level
19 R-squared
20 Adjusted R-squared
21 as.lm.nls
22 觜覈 覦覯
23 れ姶 螻
24 t-sql襦 蟲 蠏覿
25 Two-Stage Least Squares
26 谿瑚襭


1 蠏覿 蠍一 #

  • '蠏(regression)' 螳 19瑚鍵 襷 蟲 覓狩糾 螻(F.Galton) 豌 伎.
  • 蠏朱 蠏.
  • 覲 伎 蟯螻襯 企 蠏覦 蟲螻 襴暑襯 轟 螳 磯ジ 譬覲 螳 豸″ 蠍磯
  • 蠏覿 襦 レ 譯手 覦朱伎 覲 瑚骸蟯螻襯 螳 覲 伎 蟯螻襯 覿
  • 壱襯 蠏碁る慨覃 蠏覿 蟆語 襷 蟆語 蟆一 . (/ 蟯蟯螻)
  • 危企 Y 豸′ 蠏 覈語 蟲豢 蟆 蠏覿 譯朱.
  • 谿(residual)覿: 覈瑚讀
  • 蠏讌: 轟危 伎豺 覈 豢 レ 譯朱 蟯豸′ 蟆豢
  • 覈覲螳 覲朱 螳
  • る覲 願碓 讌願碓 蟯朱, 讌覲 一危一 螳 0 1襷 豬 覩碁襯 讌 一危磯ゼ 覲伎 .
  • 譬覲(y) 蠏覿 螳
  • 蠏螻 譴ク谿
    • y = a + bx 賊 1 : 68.0%螳 蟲螳伎 れ伎.
    • y = a + bx 賊 2 : 95.5%螳 蟲螳伎 れ伎.
    • y = a + bx 賊 3 : 99.7%螳 蟲螳伎 れ伎.

2 瑚骸 蟯螻 焔 譟郁唄 #

  • 譬 覲 襴 覲螳 蟯 蟯螻螳 伎 . (襴 覲螳 覲覃 譬 覲 覲伎 )
  • 襴 覲螳 譬 覲覲企 襾殊 譟伎伎 . (螳 一)
  • 譬 覲 襴 覲螳 蟯 蟯螻螳 3 覲襦 る 讌螳 伎 . (3 覲螳 譬 覲襯 る伎 )

3 覲覲(Box-Cox覲) #

覯 れ螻 螳.
library("TeachingDemos")
y <- rlnorm(500, 3, 2)
par(mfrow=c(2,2))
qqnorm(y)
qqnorm(bct(y,1/2))
qqnorm(bct(y,0))
hist(bct(y,0))

lm02.jpg

library("car")
y <- rlnorm(500, 3, 2)
hist(box.cox(y,0))

4 蠏覿 #

蠏殊 磯 蠏覿
#x: 蠏殊
x <- c(2,3,4,6,8,9,10,12,13,15,17,19,20,23,25,26,27,29,30,33)
#y: 磯
y <- c(1500,1640,1950,2700,2580,3200,3750,2960,4200,3960,4310,3990,4010,4560,4800,5300,5050,5450,5950,5650)
result <- lm(y~x)
summary(result)

蟆郁骸
Call:
lm(formula = y ~ x)

Residuals:
   Min     1Q Median     3Q    Max 
-470.0 -290.8 -168.0  293.9  789.4 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1708.117    173.489   9.846 1.13e-08 ***
x            130.960      9.089  14.409 2.52e-11 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

Residual standard error: 386.6 on 18 degrees of freedom
Multiple R-squared: 0.9202,     Adjusted R-squared: 0.9158 
F-statistic: 207.6 on 1 and 18 DF,  p-value: 2.522e-11 

蟆郁骸伎
  • 蠏: y = 130.960 * x + 1708.117
  • : 18
  • 蟆一螻(coefficient of determination): 0.9202(譟一 蟆一螻: 0.9158) 覈語 92%襯 る 詩.
  • Pr(>|t|) : F蟆 蟆郁骸 (p-value) = 2.52e-11
    • 蠏覓願: 蠏 覩瑚 .
    • 襴所: 蠏 覩瑚 .
    • p-value = 2.522e-11企襦 譴 0.05 襴所 豈 蠏手碓 . 讀, 蠏 覩瑚 .
  • 谿覿
    • 觜 觜(durbin-watson ration; れ姶螳 螻伎蟯 覓伎^)
    • 覦碁 蟆(bartlett's test; れ姶 覿一 蠏殊 譟一)
    • 谿: Residual standard error: 386.6
  • 蠏殊螳 50 磯? 130.960 * 50 + 1708.117 = 8256.117 襷企.
  • 蠏 伎 谿/譴ク谿 = 譴谿, -2.5 <= 譴谿 <= 2.5 覃 豺, 覯襯 覯企覃 伎豺 螳レ煙 .

5 譴蠏覿 #

螻給讌蠍 蟆郁骸螻 , , 豌伎 譴蠏覿

一危
#x1:, x2:, x3:豌伎, y1:螻給讌蠍 
x1 <- c(28,46,39,25,34,29,38,23,42,27,35,39,38,32,25) 
x2 <- c(146,169,160,156,161,168,154,153,160,152,155,154,157,162,142)
x3 <- c(34,57,48,38,47,50,54,40,62,39,46,54,57,53,32)
y1 <- c(22,36,24,22,27,29,26,23,31,24,23,27,31,25,23)
data <- data.frame(y1,x1,x2,x3)

蟯覿
#螳 蟯谿(壱) 覃 pairs(data)覃 . 
cov.wt(data, cor=T)

> cov.wt(data, cor=T)
$cov
         y1       x1       x2       x3
y1 16.31429 21.07143 20.01429 28.91429
x1 21.07143 48.66667 26.71429 53.64286
x2 20.01429 26.71429 52.25714 45.60000
x3 28.91429 53.64286 45.60000 82.54286

$center
       y1        x1        x2        x3 
 26.20000  33.33333 156.60000  47.40000 

$n.obs
[1] 15

$cor
          y1        x1        x2        x3
y1 1.0000000 0.7478150 0.6854618 0.7879319
x1 0.7478150 1.0000000 0.5297304 0.8463624
x2 0.6854618 0.5297304 1.0000000 0.6943082
x3 0.7879319 0.8463624 0.6943082 1.0000000

> 
蟆郁骸伎
  • x1()螻 x3(豌伎) 蟯螻螳 螳 . 0.8463624
  • x1()螻 x2() 蟯螻螳 螳 . 0.5297304
  • 螻給讌蠍(y1) 蟯螻螳 豌伎(x3), (x1), (x2) 企.

pairs 蟆郁骸
lm01.jpg

蠏覿
summary(lm(y1~x1+x2+x3))
> summary(lm(y1~x1+x2+x3))

Call:
lm(formula = y1 ~ x1 + x2 + x3)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.9976 -1.3822  0.3611  1.1549  3.9291 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.2173    17.6038  -0.751    0.469
x1            0.2014     0.1842   1.093    0.298
x2            0.1710     0.1316   1.300    0.220
x3            0.1249     0.1667   0.750    0.469

Residual standard error: 2.532 on 11 degrees of freedom
Multiple R-squared: 0.6913,     Adjusted R-squared: 0.6072 
F-statistic: 8.213 on 3 and 11 DF,  p-value: 0.003774 

> 
蟆郁骸伎
  • 蟆一螻: 0.6913 (譟一: 0.6072), 蠏 69%襯 る.
  • p-value: 0.003774朱 譴 0.05 襴所れ 豈. 讀, 蠏 .
  • 蠏: y1 = -13.2173 + x1 * 0.2014 + x2 * 0.1710 + x3 * 0.1249
  • Pr(>|t|)螳 覈 0.05襯 譴企. 覲襦 譬.
  • 覈覲 螳 る覲 ル(t)
    • (x1) = 1.093
    • (x2) = 1.300
    • 豌伎(x3) = 0.750
    • レ 螻給讌蠍(y1) 螳 襷 ルレ 殊螻 .
  • ク蠏螻 (F)
    • (x1) = 0.298
    • (x2) = 0.220
    • 豌伎(x3) = 0.469
    • 覈 0.05襯 . れ螻旧煙 伎 . 蟲語 讌 覺 .

6 #

  • 覲襯 覈 覃?
    • 豢 覦螳 觜讌.
    • y 豸′ ク襯 螳蟆螻, れ姶覿一 豢豺 螻朱螳蟆 .
    • れ螻旧(multicollinearity) 覓語 覦
  • 覲覦覯
    • 讌(forward selection)
    • 讌蟇(backward elimination)
    • (讌, 讌蟇)
  • 襦讌ろ 蠏覿 [http](http://databaser.net/moniwiki/wiki.php/%EB%A1%9C%EC%A7%80%EC%8A%A4%ED%8B%B1%ED%9A%8C%EA%B7%80%EB%B6%84%EC%84%9D#s-4) 覯 蠍一 .

企蟆 企 .
library("MASS")
fit <- lm(y1~x1+x2+x3,data=mydata)
step <- stepAIC(fit, direction="both")
step$anova # display results 

7 覩碁 #

  • 襷 焔, 手骸 螳 覲螳 蠏 伎 覃 覩碁襯 豢螳伎 .
  • p譬襯 覯譯手 朱 p-1螳 覩碁螳 .
  • 焔 2螳讌 0, 1襦 蠏碁 誤覃 讌襷 殊 れ螻 螳 伎 .
  • y = x + x1 + x2 + x3 + x4 + x5 + x6
	x1	x2	x3	x4	x5	x6
	0	0	0	0	0	0
	1	0	0	0	0	0
	0	1	0	0	0	0
覈	0	0	1	0	0	0
蠍	0	0	0	1	0	0
	0	0	0	0	1	0
	0	0	0	0	0	1

8 蟲語 #

螳 螳 伎 覲 讌觸朱 蟆郁骸 レ 殊 蟆曙郁 .

蟲語 覈
m1 <- lm(formula=Height~Girth + Volume, data=trees)
summary(m1)

> summary(m1)

Call:
lm(formula = Height ~ Girth + Volume, data = trees)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.7855 -3.3649  0.5683  2.3747 11.6910 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  83.2958     9.0866   9.167 6.33e-10 ***
Girth        -1.8615     1.1567  -1.609   0.1188    
Volume        0.5756     0.2208   2.607   0.0145 *  
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 5.056 on 28 degrees of freedom
Multiple R-squared:  0.4123,	Adjusted R-squared:  0.3703 
F-statistic:  9.82 on 2 and 28 DF,  p-value: 0.0005868

蟲語 覈
m2 <- lm(formula=Height~Girth + Volume + Girth:Volume, data=trees)
summary(m2)

> summary(m2)

Call:
lm(formula = Height ~ Girth + Volume + Girth:Volume, data = trees)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.7781 -3.5574 -0.1512  2.3631 10.5879 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  75.40148    8.49147   8.880  1.7e-09 ***
Girth        -2.29632    1.03601  -2.217 0.035270 *  
Volume        1.86095    0.47932   3.882 0.000604 ***
Girth:Volume -0.05608    0.01909  -2.938 0.006689 ** 
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 4.482 on 27 degrees of freedom
Multiple R-squared:  0.5546,	Adjusted R-squared:  0.5051 
F-statistic: 11.21 on 3 and 27 DF,  p-value: 5.898e-05

覲 蟲語 螻燕蠍(Girth*trees). A, B 蟆郁骸 螳.

A
fitted(m2)

B
coef(m2)[1] + coef(m2)[2]*trees$Girth + coef(m2)[3]*trees$Volume + coef(m2)[4]*trees$Girth*trees$Volume

9 れ螻旧(multicollinearity) #

  • る覲 伎 蟯螻螳 螳 伎 れ螻旧煙 り 襷.
  • 蟯覿(R: cov.wt())伎 蟯螻螳 1 螳蟾 る覲襯 覯襴磯.
  • れ螻旧 讌
    • る覲 x1, x2, x3, x4螳 蟆曙
    • x1 ~ x2 + x3 + x4 -> 蟯螻 R1 -> (tolerance) = 1 - R1^2
    • x2 ~ x1 + x3 + x4 -> 蟯螻 R2 -> (tolerance) = 1 - R2^2
    • x3 ~ x1 + x2 + x4 -> 蟯螻 R3 -> (tolerance) = 1 - R3^2
    • x4 ~ x1 + x2 + x3 -> 蟯螻 R4 -> (tolerance) = 1 - R4^2
    • (1 - Rj^2) = 螳 Rj螳 1 螳蟾磯襦 xj るジ る覲 蟆壱朱 .
    • 覿壱曙綾螻(VIF: variance inflation factor) = 1 / (1 - Ri^2)螳 10覲企 覃 れ螻旧煙 譟伎り 蟆渚朱 .
    • VIP = 1 / (1 - Rj^2) = 1 / (1 - 0.7432^2) = 2.233869

覿壱曙綾螻(VIF) Design 殊企襴 vif()襯 . れ 蠏 蟆郁骸企.
#install.packages("Design",dependencies=T)
library("Design")
result <- lm(y1~x1+x2+x3)
vif(result)

> library("Design")
蟲 れ Hmisc襯 襦譴

れ れ襯 豌螳: 'Hmisc'


        The following object(s) are masked from package:base :

         format.pval,
         round.POSIXt,
         trunc.POSIXt,
         units 

蟲 れ survival襯 襦譴
蟲 れ splines襯 襦譴

れ れ襯 豌螳: 'survival'


        The following object(s) are masked from package:Hmisc :

         untangle.specials 

Design library by Frank E Harrell Jr

Type library(help='Design'), ?Overview, or ?Design.Overview')
to see overall documentation.


れ れ襯 豌螳: 'Design'


        The following object(s) are masked from package:survival :

         Surv,
         survfit 

> result <- lm(y1~x1+x2+x3)
> vif(result)
      x1       x2       x3 
3.607546 1.975832 5.010688 
> 
蟆郁骸伎
  • x1 VIF: 3.607546
  • x2 VIF: 1.975832
  • x3 VIF: 5.010688
  • VIF螳 10 讌 朱襦 れ螻旧煙 蟆朱 .

10 蠏讌 #

谿..
  • 蠏覿襯 磯殊狩螻,
  • 覿一 殊螻,
  • 豪 豢碁ゼ 覲伎伎 .

讀, 谿 蠏, 焔一, 襴曙煙 襷譟燕伎 .

par(mfrow=c(2,2))
plot(lm(formula = y1 ~ x1 + x2 + x3 + x1:x2))
regression_diagnosis.png

  • 譬: 豸♀, 谿 --> 0朱覿 螻襯願 覿螻 螳?
  • 一: 譴 谿 Q-Q Plot --> 蠏煙 螳?
  • 譬: 豸♀, ろ 谿(譴 谿) 螻炎啓 --> 讌 螳蟾 .. 覿一 殊讌 伎 .
  • 壱: 覯讌 ろ 谿 plot --> rs = (k + 1 / n) * 2; k = predictor , n = sample_size; rs覲企 螳覃 outlier


> out <- lm(formula = y1 ~ x1 + x2 + x3 + x1:x2)
> shapiro.test(resid(out))

        Shapiro-Wilk normality test

data:  resid(out) 
W = 0.9669, p-value = 0.8099

> 
蟆郁骸 覲 p-value = 0.8099襦 '蠏覿 谿願 '朱 蠏覓願れ 譟磯 讌讌. 讀, 谿 蠏覿襯 磯ジ.

焔一
觚襦-螳 ろ(Breusch- Pagan Test)襦 焔一語 覲碁.
> library("lmtest")
> bptest(out)

        studentized Breusch-Pagan test

data:  out 
BP = 5.6201, df = 4, p-value = 0.2294

> 
p-value螳 0.2294襦 譴 0.05 蠏覓願 讌讌. 讀, 焔一企. 襷 企(heterogenous variance) 蟆曙 螳譴豕螻(weighted least squares; nls()) . 覲覲 牛 蟲 .
> out <- lm(formula = y1 ~ x1 + x2 + x3 + x1:x2, weights=I(x1^(2)))
> summary(out)

Call:
lm(formula = y1 ~ x1 + x2 + x3 + x1:x2, weights = I(x1^(2)))

Residuals:
     Min       1Q   Median       3Q      Max 
-106.194  -34.272    4.358   30.456   85.645 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 206.98239   62.41049   3.316  0.00779 **
x1           -6.74461    1.84209  -3.661  0.00438 **
x2           -1.25865    0.40492  -3.108  0.01109 * 
x3            0.44112    0.13692   3.222  0.00915 **
x1:x2         0.04199    0.01105   3.800  0.00349 **
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

Residual standard error: 61.32 on 10 degrees of freedom
Multiple R-squared: 0.8834,     Adjusted R-squared: 0.8368 
F-statistic: 18.94 on 4 and 10 DF,  p-value: 0.0001167 

襴曙
> library("lmtest")
> dwtest(out)

        Durbin-Watson test

data:  out 
DW = 2.3524, p-value = 0.7083
alternative hypothesis: true autocorrelation is greater than 0 

> 
  • d糾
    • 0 : 蠍一蟯, p-value = 1
    • 2 : 襴, p-value = 0
    • 4 : 蠍一蟯, p-value = -1
  • 旧朱 DW螳 1覲企 蟇磯 3覲企 覃 蠍一蟯 ろ り 朱, 1.5~2.5 伎 蟆曙 襴曙企手
  • 襴.

谿瑚:

11 蠏 觜蟲 #

data(airquality)
lm1<-lm(Ozone~.,airquality) # full model
lm2<-lm(Ozone~Solar.R+Wind +Month+Day,airquality) # reduced model
anova(lm2,lm1)

12 る 蠏覿 #

蠍一 蠏碁 lm() る 蠏覿 覦覯襷 覲伎.
y2 <- x1 #x1 譬覲 螳螻...
summary(lm(cbind(y1,y2)~x2+x3))

> y2 <- x1
> summary(lm(cbind(y1,y2)~x2+x3))
Response y1 :

Call:
lm(formula = y1 ~ x2 + x3)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5060 -1.5862  0.2502  0.8539  5.3777 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -9.8745    17.4765  -0.565   0.5825  
x2            0.1493     0.1311   1.139   0.2770  
x3            0.2678     0.1043   2.567   0.0247 *
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

Residual standard error: 2.552 on 12 degrees of freedom
Multiple R-squared: 0.6578,     Adjusted R-squared: 0.6008 
F-statistic: 11.53 on 2 and 12 DF,  p-value: 0.001605 


Response y2 :

Call:
lm(formula = y2 ~ x2 + x3)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4716 -1.9151 -0.2964  1.9562  7.1935 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  16.5997    27.1675   0.611 0.552589    
x2           -0.1079     0.2038  -0.529 0.606186    
x3            0.7095     0.1622   4.375 0.000904 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

Residual standard error: 3.967 on 12 degrees of freedom
Multiple R-squared: 0.7228,     Adjusted R-squared: 0.6766 
F-statistic: 15.65 on 2 and 12 DF,  p-value: 0.0004537 


> 

13 蟆郁骸 伎 #

覘 企蟆蟾讌 伎 讌?? 企 蟇 螳朱.
> summary(model)

Call:
lm(formula = 糾 ~  + 蟆一, data = result)

Residuals:
   Min     1Q Median     3Q    Max 
-5.348 -2.274 -1.276  2.954  5.673 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  53.6832    14.1808   3.786  0.00431 **
          0.6073     0.1984   3.062  0.01353 * 
蟆一         -1.9346     0.9144  -2.116  0.06348 . 
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

Residual standard error: 3.721 on 9 degrees of freedom
Multiple R-squared: 0.8289,     Adjusted R-squared: 0.7909 
F-statistic:  21.8 on 2 and 9 DF,  p-value: 0.0003544 
  • 蠏讌 y=糾, x1=, x2=蟆一 企朱, y = 53.6832 + 0.6073x1 + -1.9346x2 朱 豢.
  • -1.9346x2朱 襷企(-)螳 螳讌る 蟆 蟆一 襦 糾煙 觜讌る 蟆 覩誤.
  • '硫1=0'企朱 蠏覓願れ T 蟆糾 3.062企, 願 P螳 0.01353朱 譴 0.05 蠏覓願れ 蠍郁蟆 . 讀, 覩誤 螳企.
  • 蟆一螻 R2 anova(覿磯) 蟆郁骸 R2 = (541.69 + 61.97)/(541.69 + 61.97 + 124.59) = 0.8289186405願, 譟一 R2螳 0.7909 企. 讀, 糾煙 煙螻 蟆一 2螳 覲 83%螳 る伎.
  • F 蟆糾 21.8企, F覿 留 = 0.05, v1=2, v2=9 螳 4.26企. 讀, 21.8 > 4.26 企襦 蠏覓願(硫1=硫2=0) 蠍郁. 讀, 覩誤 螳企. p-value螳 る襦 F覿襯 覲 . 譴 0.05 > 0.0003544 企襦 蠏覓願れ 蠍郁覃 .
  • p-value螳 朱 襴所れ 讌讌螻, 貉れ覃 蠏覓願れ 讌讌. 0.0003544 蠏覓願れ 襷り 螳 蟆曙 詞伎 蟆糾覲企 蠏豪 蟆郁骸螳 襯企. 讀, 糾 豺 蠏手碓 蟆一 る(1譬 る)螳 襯.
  • 譴 0.05朱 蟆 95%襷 蠏 螳蟾 朱 螳螻, 蠏螳 覃襴 伎 5% 觜朱 螳る 蟆企.

> anova(model) #, 螻燕, 螻燕蠏 F螳, 
Analysis of Variance Table

Response: 糾
          Df Sum Sq Mean Sq F value    Pr(>F)    
       1 541.69  541.69 39.1297 0.0001487 ***
蟆一       1  61.97   61.97  4.4762 0.0634803 .  
Residuals  9 124.59   13.84                      
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1 

螻燕螻燕蠏F觜
603.6611301.8321.802950
谿124.59913.84
728.2511

  • F 蟆糾 21.8企, F覿 留 = 0.05, v1=2, v2=9 螳 4.26企. 讀, 21.8 > 4.26 企襦 蠏覓願(硫1=硫2=0) 蠍郁. 讀, 覩誤 螳企.
  • 蟆一螻 R2 R2 = 603.66/728.25 = 0.82891864 企.
  • 襦 . ろ . 螳 讌 碁 覲危 覲語 螳襯 覩誤.

谿瑚: p-value 觸願鍵
fit <- lm(Volume ~ Girth + Height, data=trees)
f <- summary(fit)$fstatistic
pf(f[1],f[2],f[3],lower.tail=F)

谿瑚: 95% 襤郁規螳
confint(model, level=0.95)

                2.5 %     97.5 %
(Intercept)  502.5676  524.66261
poly(q, 3)1 1919.2739 2232.52494
poly(q, 3)2 -264.6292   48.62188
poly(q, 3)3  707.3999 1020.65097

14 譴 蠏螻(beta) #

  • B : 觜譴 蠏螻
  • beta : 譴 蠏螻

model <- lm(Girth ~ Height + Volume, data=trees)
summary(model)

蟆郁骸
> summary(model)

Call:
lm(formula = Girth ~ Height + Volume, data = trees)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.34288 -0.56696 -0.08628  0.80283  1.11642 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.81637    1.97320   5.482 7.45e-06 ***
Height      -0.04548    0.02826  -1.609    0.119    
Volume       0.19518    0.01096  17.816  < 2e-16 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 0.7904 on 28 degrees of freedom
Multiple R-squared:  0.9408,	Adjusted R-squared:  0.9366 
F-statistic: 222.5 on 2 and 28 DF,  p-value: < 2.2e-16

螳螳 螻(B) る, 企 覲螳 レ 殊讌 .
企 蟆曙 beta襯 覲企企, lm蟆郁骸 れ . れ螻 螳 覃 .
#install.packages("QuantPsyc")
library("QuantPsyc")
lm.beta(model)

蟆郁骸
> lm.beta(model)
     Height      Volume 
-0.09235166  1.02236871 
Volume 蟇一 覈 レ 殊海. 覓朱 beta襯 覲願鍵 Height 覲 覈語 蟇 蟆企.


15 predict #

y <- seq(1: nrow(x1))
dau <- x1$dau
fit <- lm(dau ~ poly(y, 3))
summary(fit)

plot(dau~y)
lines(y, predict(fit, data.frame(y=y)), col="purple")

newdata <- data.frame(y = seq(139,145)) 
predict(fit, newdata)

out <- lm(dau~y + I(y^2) + I(y^3)) 
summary(out)

p <- 145
3.131e+06 - p*2.637e+04 + (p^2) * 2.438e+02 - (p^3)*1.138e+00

16 襷一る 貊貊 谿伎 伎豺 蟇 蠏覿蠍 #

plot(y ~ x, data=df)
notin <- identify(df$x, df$y, labels=row.names(df)) 

fit <- lm(y ~ x - 1, data=df[!rownames(df) %in% notin,])
summary(fit)

plot(y ~ x, data=df[!rownames(df) %in% notin,])
abline(fit)

17 coef 襤郁規螳 #

se <- sqrt(diag(vcov(bu.model)))[2]
ci.upr <-  coef(bu.model)[2] + 1.96*se
ci.lwr <-  coef(bu.model)[2]  - 1.96*se

18 confidence, prediction level #

  • interval = "confidence" 蠏 蟆, 讀, れ姶(intercept) 螻れ (0朱 螳) 覯螳 prediction 觜 譬.
  • interval = "prediction" single value 蟆, 讀, れ姶(intercept)螳 螻ろ伎
# Make up some data
x <- seq(0, 1, length.out = 20)
y <- x^2 + rnorm(20)

# fit a quadratic
model <- lm(y ~ I(x^2))
fitted <- predict(model, interval = "confidence", level=.95) 

# plot the data and the fitted line
plot(x, y)
lines(x, fitted[, "fit"])

# now the confidence bands
lines(x, fitted[, "lwr"], lty = "dotted")
lines(x, fitted[, "upr"], lty = "dotted")

谿瑚

19 R-squared #

R^2 覩
  • 蠏 る 覲
  • れ螳螻 豸♀ 蟯蟯螻(R^2 殊伎 蟯螻 螻)

蠏 る 覲
d <- data.frame(y=(1:10)^2,x=1:10)
model <- lm(y~x,data=d)
d$prediction <- predict(model,newdata=d)

#R-squared
1-sum((d$prediction-d$y)^2)/sum((mean(d$y)- d$y)^2)

襴暑螳 襷讌覃 R^2螳 貉れ 蟆渚レ . 蠏碁 譟一 R^2襯 .
襴暑襯 蠏螳 觜蟲 譟一 R^2襦 觜蟲伎 .
譟一 R^2 れ螻 螳 蟲.
m <- lm(Girth~Height, trees)
summary(m)

> summary(m)

Call:
lm(formula = Girth ~ Height, data = trees)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.2386 -1.9205 -0.0714  2.7450  4.5384 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -6.18839    5.96020  -1.038  0.30772   
Height       0.25575    0.07816   3.272  0.00276 **
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 2.728 on 29 degrees of freedom
Multiple R-squared:  0.2697,	Adjusted R-squared:  0.2445 
F-statistic: 10.71 on 1 and 29 DF,  p-value: 0.002758

20 Adjusted R-squared #

n <- length(trees$Girth) #覲語
p <- 1                   #襴暑
R2 <- cor(trees$Girth, fitted(m))^2 #R-squared
1 - ((n-1)*(1-R2)/(n-p-1)) #Adjusted R-squared

21 as.lm.nls #

as.lm.nls <- function(object, ...) {
    if (!inherits(object, "nls")) {
        w <- paste("expected object of class nls but got object of class:", 
			paste(class(object), collapse = " "))
		warning(w)
	}

	gradient <- object$m$gradient()
	if (is.null(colnames(gradient))) {
		colnames(gradient) <- names(object$m$getPars())
	}

	response.name <- if (length(formula(object)) == 2) "0" else 
		as.character(formula(object)[[2]])

	lhs <- object$m$lhs()
	L <- data.frame(lhs, gradient)
	names(L)[1] <- response.name

	fo <- sprintf("%s ~ %s - 1", response.name, 
		paste(colnames(gradient), collapse = "+"))
	fo <- as.formula(fo, env = as.proto.list(L))

	do.call("lm", list(fo, offset = substitute(fitted(object))))

}

predict(as.lm.nls(fit), data.frame(x=xx), interval="prediction", level=.99)

22 觜覈 覦覯 #

x <- c(3,4,5,6,7)
y <- c(4.8, 6, 7.2, 7.8, 8.2)
fit1 <- lm(y ~ x)
summary(fit1)

plot(x,y)
abline(fit1)

library("zyp")
fit2 <- zyp.sen(y ~ x)
fit2
abline(coef=coef(fit2), col="red")
np_regression.png
--觜螳 觜覈 覦覯

23 れ姶 螻 #

願碓 蠏狩 蠏谿, 襷れ 蟲.
y <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
x <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
model <- lm(y ~ x)

library("DMwR")
regr.eval(y, fitted(model))
plot(x,y)
abline(model)

> regr.eval(y, fitted(model))
mae        mse       rmse       mape 
0.41276110 0.24190200 0.49183534 0.08399855 

24 t-sql襦 蟲 蠏覿 #

select
        (count(*)*sum(x*y)-(sum(x)*sum(y)))/(count(*)*sum(x*x)-(sum(x)*sum(x))) slope
    ,   avg(y)-((count(*)*sum(x*y))-(sum(x)*sum(y)))/((count(*)*sum(power(x,2)))-power(sum(x),2))*avg(x) intercept
from tbl_xy

create view v_test
as
select 98 x, 1 y union all
select 90, 2 union all
select 90, 3 union all
select 65, 4 union all
select 66, 5 union all
select 67, 6 union all
select 13, 7 union all
select 44, 8 union all
select 27, 9 
go

declare 
    @x nvarchar(255)
,   @y nvarchar(255)
,   @tablenm nvarchar(255)
,   @where nvarchar(1000)
,   @a nvarchar(255)
,   @b nvarchar(255)
,   @sql nvarchar(max);

set @x = 'x';
set @y = 'y';
set @tablenm = 'select * from v_test';
set @where = '';
set @sql = '
    declare 
        @cnt int
    ,   @a decimal(38,5)
    ,   @b decimal(38,5)
    ,   @c decimal(38,5)
    ,   @d decimal(38,5)
    ,   @xavg decimal(38,5)
    ,   @yavg decimal(38,5)
    ,   @a1 decimal(38,5)
    ,   @b1 decimal(38,5);

    select 
        @cnt = count(*) 
    ,   @a = sum(convert(decimal(38,5), ' + @x + '))
    ,   @b = sum(convert(decimal(38,5), ' + @y + '))
    ,   @c = sum(convert(decimal(38,5), power(' + @x + ', 2)))
    ,   @d = sum(convert(decimal(38,5), ' + @x + '*' + @y + '))
    ,   @xavg = avg(convert(decimal(38,5), ' + @x + '))
    ,   @yavg = avg(convert(decimal(38,5), ' + @y + '))
    from (' + @tablenm + ') as t;

    set @a1 = ((@b*@c) - (@a*@d)) / (@cnt*@c - power(@a, 2));
    set @b1 = ((@cnt*@d) - (@a*@b)) / (@cnt*@c - power(@a, 2));

    select
        蠏
    ,   round(r2, 4) [R-squared]
    ,   round(1-(1-r2)*(@cnt-1)/(@cnt-2),4) [Adjusted R-squared]
    ,   round(corr, 4) 蟯螻
    from (
        select
            ''y = ''+ convert(nvarchar, @a1) + '' + '' + convert(nvarchar, @b1) +  ''x'' 蠏    
        ,   1 - var(' + @y + '-(@a1 + @b1*' + @x + ')) / 
                    var(' + @y + ') r2
        ,   ((1.00/(@cnt - 1)*sum((' + @x + '-@xavg) * (' + @y + '-@yavg))) / 
                    sqrt(var(' + @x + ') * var(' + @y + '))) corr
        from (' + @tablenm + ') as t
        where 1=1 ' + @where + '
    ) t

    select
        row_number() over(order by ' + @x + ') seq
    ,   ' + @x + ' x
    ,   ' + @y + ' y
    ,   power(' + @x + ', 2) x2
    ,   ' + @x + '*' + @y + ' xy
    ,   @a1 + @b1*' + @x + ' estimatedvalue_linear
    --,   @a1 * log(@a) + @b1 *' + @x + ' estimatedvalue_log
    ,   ' + @y + '-(@a1 + @b1*' + @x + ') diffvalue
    from (' + @tablenm + ') as t
    where 1=1 ' + @where + '
    order by 2;
    ';
print @sql;
exec(@sql);

lm01.gif

lm02.gif

--蠏碁 豢豌:http://www.sqlservercentral.com/articles/SQLCLR/71942/

25 Two-Stage Least Squares #

y ~ x 覈語 . ( 覈語 れ姶螳 譬 )
蠏碁磯 x k 譬讌一 . y 蟯螻.

蠏碁覃..
  • x' ~ k 螳 x襯 豢(x')螻,
  • y ~ x' 螳 豢 x x'襯 覈語 k.
覃 れ姶螳 譴企り .
企蟆 2螻襦 regression 蟆 Two-Stage Least Squares(2SLS)手 .

蠍一, k 蟲 覲(instrument variable)手 .

26 谿瑚襭 #