Contents

1
2 data 譴觜
3 simple regression
4 multi regression
5


1 #

linear regression lm() 襯 伎覃 . 覓語 覈 豕襯 覲願鍵 企襦 optim() 襯 伎 cost function() 襷れ 豕 覦覯 覲願.

2 data 譴觜 #

trees 一危一 伎 蟆碁, 貉朱 覓語 貊蠍 蠏谿朱 覓語襦 覲蟆渚.
df <- trees
colnames(df) <- tolower(colnames(df))
head(df)

一危郁 譴觜.

3 simple regression #

襴 覲螳 1螳 蟆曙一 simple regression企手 . lm() 覃 螳 .
lm.out <- lm(volume ~ girth, data=df)
summary(lm.out)

蟆郁骸
> summary(lm.out)

Call:
lm(formula = volume ~ girth, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-8.065 -3.107  0.152  3.495  9.587 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
girth         5.0659     0.2474   20.48  < 2e-16 ***
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 4.252 on 29 degrees of freedom
Multiple R-squared:  0.9353,	Adjusted R-squared:  0.9331 
F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

蟆郁骸 豕螻焔朱 豕 蟆郁骸襦 螳 蠏 襷 .

volume = 5.0659 * girth - 36.9435

5.0659 螳譴豺螻, -36.9435 覦伎伎る.

願 朱 企慨. 螳れ れ螻 螳.

H(x) = wx + b

豕 讀, cost function

cost(w,b) = avg((H(x) - y)^2)

螳 蟆. 豢螳(H(x))螻 れ螳(y) 谿願 螳 蟆 谿城 蟆 覈企. 讀, cost(w,b)螳 豕 蟆 覈企. 螻煙 伎 襷企り れ 蟆 蠍 螻 豢螳螻 れ螳 谿願 螳 磯ゼ 譯手鍵 企. れ螻 螳 cost function 襷り optim() 襯 伎伎 豕襯 企慨. 谿瑚襦 optim() 豕螳 谿城 蟆 default.

cost.f <- function(par, x, y){
  w <- par[1]
  b <- par[2]
  H <- w*x + b
  mean((H - y)^2)
}

result <- optim(par = c(0, 0), cost.f, x = df$girth, y = df$volume)
result

蟆郁骸
> result
$par
[1]   5.066008 -36.945325

$value
[1] 16.91299

$counts
function gradient 
     103       NA 

$convergence
[1] 0

$message
NULL

蟆郁骸襯 覲企

  • w = 5.066008
  • b = -36.945325

襦 豕. lm() 蟆郁骸 volume = 5.0659 * girth - 36.9435 蟇一 螳.

4 multi regression #

襴 覲螳 2螳 伎 蟆曙磯ゼ multi regression企手 . multi regression 襷谿螳讌襦 lm() 襯 伎 蟆螻 cost function 襷れ 豕 覦覯 覲伎. 襾殊 lm() 襯 伎覃 れ螻 螳 蟆郁骸襯 詞 .
lm.out <- lm(volume ~ girth + height, data=df)
summary(lm.out)

蟆郁骸
> summary(lm.out)

Call:
lm(formula = volume ~ girth + height, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.4065 -2.6493 -0.2876  2.2003  8.4847 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -57.9877     8.6382  -6.713 2.75e-07 ***
girth         4.7082     0.2643  17.816  < 2e-16 ***
height        0.3393     0.1302   2.607   0.0145 *  
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 3.882 on 28 degrees of freedom
Multiple R-squared:  0.948,	Adjusted R-squared:  0.9442 
F-statistic:   255 on 2 and 28 DF,  p-value: < 2.2e-16

伎 cost function 襷れ伎狩, 覓語 襴暑螳 n螳 蟆 れ螻 螳 hypothesis螳 覲旧″伎る 蟆企.

H(x) = w1x1 + w2x2 + ... + b

願 るる 朱 蠏狩伎狩, 願 襦 豬蠍覃 讌. 讀,

H(X) = WX + b

螳 覓語 W 覓語 X襯 襦 豬蠍 H()襯 襷る simple regression螻 螳 螳 蟆 . b X 貅 蟆

H(X) = WX

螳 襷 . R襦 cost function 襷り鍵 一 b襯 一危一 る, 1襦 誤覃 .

b <- 1
tmp <- cbind(b, df)

optim() par 朱誤磯 豐蠍郁 譯朱 蟆碁 螻(inner product) る 蠏豺 襷豢一 . 襯 る,

  • M1 = 3 2
  • M2 = 3 2

M1M2襯 る M1(3x2) M2(3x2) 覲朱豌 覿覿(M1 M2 伎螳 螳 ) 螳朱 蠏豺 . 蠏碁覩襦 M1M2 . 一危磯 31 3(girth, height, b)願 豐蠍郁 3螳螳 ル覩襦 3 1企 襷る 螻煙 蟆 . R 螻煙 %*% 一一襯 磯 覩襦 れ螻 螳 cost function 燕覃 .
cost.f <- function(par, X, y){
  W <- as.matrix(par)
  X <- as.matrix(X)
  mean((X%*%W - y)^2)
}

cost function 燕朱 伎 豕 企慨.
result <- optim(par = c(0, 0, 0), cost.f, X = tmp[1:3], y = tmp[4])
result

蟆郁骸
> result
$par
[1] -57.9996505   4.7082120   0.3394336

$value
[1] 13.61037

$counts
function gradient 
     238       NA 

$convergence
[1] 0

$message
NULL

蟆郁骸襯 覲企 lm() 蟇一 .

5 #

れ螻 螳 伎豺螳 一危郁 .
set.seed(123)
mydata <- within(data.frame(x=1:10), y <- rnorm(x, mean=x))
mydata$y[2] <- 20
plot(mydata)
--豢豌: http://www.alastairsanderson.com/R/tutorials/robust-regression-in-R/
outlier.png

lm() 豕螻焔朱 伎豺 豬渚. lm()襯 磯 れ螻 螳 蟆郁骸襯 視.
lm.out <- lm(y~x, data=mydata)
plot(mydata)
abline(lm.out, col="red")
lm.png

豕螻焔 襷螻 るジ 覦覯 企慨襦 . れ螳螻 豢豺 谿伎 螻煙 谿願 覃 磯ゼ 譴讌襷, 蠍一 蠏 ク谿(MAD, mean absolute deviation)襦 磯ゼ 譯殊 襷覲伎.
cost.f <- function(par, x, y){
  w <- par[1]
  b <- par[2]
  H <- w*x + b
  mean(abs(H - y))
}

result <- optim(par = c(0, 0), cost.f, x = mydata$x, y = mydata$y)
result

蟆郁骸
> result
$par
[1] 0.8849192 0.7047651

$value
[1] 2.37167

$counts
function gradient 
     249       NA 

$convergence
[1] 0

$message
NULL

蠏碁殊朱 觜蟲企慨.
plot(mydata)
abline(lm.out, col="red")
abline(a = result$par[1], b = result$par[2], col = "blue")
mad.png

覘螳 lm() 襯 蟆覲企 蟯谿 覲伎願鹸 . 讌襷, 襷譟煙る曙 .
谿瑚襦 robust regression 蟆郁骸 れ螻 螳. black line rlm() 蟆郁骸.
library(MASS)
rlm.out <- rlm(y~x, data=mydata)
abline(rlm.out)
rlm.png

.. 譬. fitting.