#title 구조방정식
[[TableOfContents]]

lavaan package [http://lavaan.ugent.be/tutorial/index.html tutorial]을 따라해 본거다. lavaan는 "latent variable analysis(잠재 변수 분석)"의 약자다. 잠재 변수란, 관측 변수로부터 추정되어진 추상적 개념의 변수를 말한다. 구조 방정식의 변수들에 대한 설명은 [http://blog.naver.com/lucifer246?Redirect=Log&logNo=173521493 여기]를 참고하자.

==== 사회과학에서의 구조 방정식 ====
http://www.ktcloudware.com/seminar/down/06.pdf
 * 측정오차 
  * 대부분 사회과학에서 사용하는 변수는 측정오차가 존재 
  * 잠재변수(latent variable)을 사용 
 * 측정의 구성타당도 
  * 사회과학에서 사용하는 개념은 대체로 추상적 개념 
  * 확인적 요인분석(Confirmatory factor analysis)을 사용 
 * 인과모형 
  * 이론에서 가정한 개념들 간의 인과관계 
  * 구조방정식모형(Structural equation model)을 사용

==== 확인적 요인 분석(cfa, confirmatory factor analysis) ====
{{{HolzingerSwineford1939}}} 데이터로 하는데, 이 데이터에 대한 설명은 [http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/lavaan/html/HolzingerSwineford1939.html 여기]를 참고하면 된다. 

{{{
library("lavaan")
data(HolzingerSwineford1939)
str(HolzingerSwineford1939)
}}}

{{{
> library("lavaan")
> data(HolzingerSwineford1939)
> str(HolzingerSwineford1939)
'data.frame':	301 obs. of  15 variables:
 $ id    : int  1 2 3 4 5 6 7 8 9 11 ...
 $ sex   : int  1 2 2 1 2 2 1 2 2 2 ...
 $ ageyr : int  13 13 13 13 12 14 12 12 13 12 ...
 $ agemo : int  1 7 1 2 2 1 1 2 0 5 ...
 $ school: Factor w/ 2 levels "Grant-White",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ grade : int  7 7 7 7 7 7 7 7 7 7 ...
 $ x1    : num  3.33 5.33 4.5 5.33 4.83 ...
 $ x2    : num  7.75 5.25 5.25 7.75 4.75 5 6 6.25 5.75 5.25 ...
 $ x3    : num  0.375 2.125 1.875 3 0.875 ...
 $ x4    : num  2.33 1.67 1 2.67 2.67 ...
 $ x5    : num  5.75 3 1.75 4.5 4 3 6 4.25 5.75 5 ...
 $ x6    : num  1.286 1.286 0.429 2.429 2.571 ...
 $ x7    : num  3.39 3.78 3.26 3 3.7 ...
 $ x8    : num  5.75 6.25 3.9 5.3 6.3 6.65 6.2 5.15 4.65 4.55 ...
 $ x9    : num  6.36 7.92 4.42 4.86 5.92 ...
> 
}}}

{{{
model <- "
visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9
"
fit <- cfa(model, data = HolzingerSwineford1939)
summary(fit, fit.measures = TRUE)
}}}

{{{
> summary(fit, fit.measures = TRUE)
lavaan (0.5-15) converged normally after  35 iterations

  Number of observations                           301

  Estimator                                         ML
  Minimum Function Test Statistic               85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Model test baseline model:

  Minimum Function Test Statistic              918.852
  Degrees of freedom                                36
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.931
  Tucker-Lewis Index (TLI)                       0.896

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3737.745
  Loglikelihood unrestricted model (H1)      -3695.092

  Number of free parameters                         21
  Akaike (AIC)                                7517.490
  Bayesian (BIC)                              7595.339
  Sample-size adjusted Bayesian (BIC)         7528.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.092
  90 Percent Confidence Interval          0.071  0.114
  P-value RMSEA <= 0.05                          0.001

Standardized Root Mean Square Residual:

  SRMR                                           0.065

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)
Latent variables:
  visual =~
    x1                1.000
    x2                0.554    0.100    5.554    0.000
    x3                0.729    0.109    6.685    0.000
  textual =~
    x4                1.000
    x5                1.113    0.065   17.014    0.000
    x6                0.926    0.055   16.703    0.000
  speed =~
    x7                1.000
    x8                1.180    0.165    7.152    0.000
    x9                1.082    0.151    7.155    0.000

Covariances:
  visual ~~
    textual           0.408    0.074    5.552    0.000
    speed             0.262    0.056    4.660    0.000
  textual ~~
    speed             0.173    0.049    3.518    0.000

Variances:
    x1                0.549    0.114
    x2                1.134    0.102
    x3                0.844    0.091
    x4                0.371    0.048
    x5                0.446    0.058
    x6                0.356    0.043
    x7                0.799    0.081
    x8                0.488    0.074
    x9                0.566    0.071
    visual            0.809    0.145
    textual           0.979    0.112
    speed             0.384    0.086

> 
}}}

{{{
library(semPlot)
semPaths(fit, 
         what="std",
         edge.label.cex = 0.6,
         sizeMan=5,
         sizeLat=5,
         curve=0.4
)
}}}
attachment:구조방정식/sem3.png

아래와 같이 표현할 수 있으나, 없어질 함수라고 함.
{{{
library(qgraph)
qgraph.lavaan(fit, layout="tree", titles=F,
              vsize.man=5,
              vsize.lat=5,
              filetype="",
              include=4,
              curve=-0.4,
              edge.label.cex=0.6)
}}}
==== 구조방정식(SEM, structural equation modeling) ====
{{{PoliticalDemocracy}}} 데이터를 사용한다. PoliticalDemocracy에 대한 설명은 [http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/lavaan/html/PoliticalDemocracy.html 여기]를 참고 한다. 

{{{
library("lavaan")
data(PoliticalDemocracy)
model <- "
  # measurement model
    ind60 =~ x1 + x2 + x3
    dem60 =~ y1 + y2 + y3 + y4
    dem65 =~ y5 + y6 + y7 + y8
  # regressions
    dem60 ~ ind60
    dem65 ~ ind60 + dem60
  # residual correlations
    y1 ~~ y5
    y2 ~~ y4 + y6
    y3 ~~ y7
    y4 ~~ y8
    y6 ~~ y8
"
fit <- sem(model, data = PoliticalDemocracy)
summary(fit, standardized = TRUE)
}}}

{{{
> summary(fit, standardized = TRUE)
lavaan (0.5-15) converged normally after  68 iterations

  Number of observations                            75

  Estimator                                         ML
  Minimum Function Test Statistic               38.125
  Degrees of freedom                                35
  P-value (Chi-square)                           0.329

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)   Std.lv  Std.all
Latent variables:
  ind60 =~
    x1                1.000                               0.670    0.920
    x2                2.180    0.139   15.742    0.000    1.460    0.973
    x3                1.819    0.152   11.967    0.000    1.218    0.872
  dem60 =~
    y1                1.000                               2.223    0.850
    y2                1.257    0.182    6.889    0.000    2.794    0.717
    y3                1.058    0.151    6.987    0.000    2.351    0.722
    y4                1.265    0.145    8.722    0.000    2.812    0.846
  dem65 =~
    y5                1.000                               2.103    0.808
    y6                1.186    0.169    7.024    0.000    2.493    0.746
    y7                1.280    0.160    8.002    0.000    2.691    0.824
    y8                1.266    0.158    8.007    0.000    2.662    0.828

Regressions:
  dem60 ~
    ind60             1.483    0.399    3.715    0.000    0.447    0.447
  dem65 ~
    ind60             0.572    0.221    2.586    0.010    0.182    0.182
    dem60             0.837    0.098    8.514    0.000    0.885    0.885

Covariances:
  y1 ~~
    y5                0.624    0.358    1.741    0.082    0.624    0.296
  y2 ~~
    y4                1.313    0.702    1.871    0.061    1.313    0.273
    y6                2.153    0.734    2.934    0.003    2.153    0.356
  y3 ~~
    y7                0.795    0.608    1.308    0.191    0.795    0.191
  y4 ~~
    y8                0.348    0.442    0.787    0.431    0.348    0.109
  y6 ~~
    y8                1.356    0.568    2.386    0.017    1.356    0.338

Variances:
    x1                0.082    0.019                      0.082    0.154
    x2                0.120    0.070                      0.120    0.053
    x3                0.467    0.090                      0.467    0.239
    y1                1.891    0.444                      1.891    0.277
    y2                7.373    1.374                      7.373    0.486
    y3                5.067    0.952                      5.067    0.478
    y4                3.148    0.739                      3.148    0.285
    y5                2.351    0.480                      2.351    0.347
    y6                4.954    0.914                      4.954    0.443
    y7                3.431    0.713                      3.431    0.322
    y8                3.254    0.695                      3.254    0.315
    ind60             0.448    0.087                      1.000    1.000
    dem60             3.956    0.921                      0.800    0.800
    dem65             0.172    0.215                      0.039    0.039

> 
}}}

{{{
library(semPlot)
semPaths(fit, 
         what="std",
         edge.label.cex = 0.6,
         sizeMan=5,
         sizeLat=5,
         curve=0.4, 
         edge.color="black"
)
}}}
attachment:구조방정식/sem4.png
==== intercept , group, starting value, fitting function, invariance ====
===== intercept =====
{{{
model <- "
# three-factor model
visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9

# intercepts
x1 ~ 1
x2 ~ 1
x3 ~ 1
x4 ~ 1
x5 ~ 1
x6 ~ 1
x7 ~ 1
x8 ~ 1
x9 ~ 1
"
fit <- cfa(model, data = HolzingerSwineford1939, meanstructure=T)
summary(fit, fit.measures = TRUE)

library(semPlot)
semPaths(fit, 
         what="std",
         edge.label.cex = 0.6,
         sizeMan=5,
         sizeLat=5,
         curve=0.4, 
         edge.color="black"
)
}}}
attachment:구조방정식/sem5.png

===== group =====
{{{
model <- "
# three-factor model
visual  =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9
"
fit <- cfa(model, data = HolzingerSwineford1939, group="school")
summary(fit, fit.measures = TRUE)
}}}

===== starting value =====
{{{
model <- "
# three-factor model
visual  =~ x1 + 0.5*x2 + c(0.6, 0.8)*x3
textual =~ x4 + start(c(1.2, 0.6))*x5 + a*x6
speed   =~ x7 + x8 + x9
"
fit <- cfa(model, data = HolzingerSwineford1939, group="school")
summary(fit, fit.measures = TRUE)
}}}
starting value가 0.5*x2같이 상수로 주어 질 수 있다. 또한 c(0.6, 0.8)와 같이 벡터로 group별로 starting value를 따로 줄 수 있다.


===== fitting function =====
{{{
HS.model <- '  visual =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
fit <- cfa(HS.model, 
           data = HolzingerSwineford1939, 
           group = "school",
           group.equal = c("loadings"))
summary(fit)
}}}
group.equal에 loadings 대신 다음과 같은 것들이 올 수 있다. 
 * intercepts: the intercepts of the observed variables
 * means: the intercepts/means of the latent variables
 * residuals: the residual variances of the observed variables
 * residual.covariances: the residual covariances of the observed variables
 * lv.variances: the (residual) variances of the latent variables
 * lv.covariances: the (residual) covariances of the latent varibles
 * regressions: all regression coefficients in the model

If you omit the group.equal argument, all parameters are freely estimated in each group (but the model structure is the same).

But what if you want to constrain a whole group of parameters (say all factor loadings and intercepts) across groups, except for one or two parameters that need to stay free in all groups. For this scenario, you can use the argument group.partial, containing the names of those parameters that need to remain free. For example:

{{{
fit <- cfa(HS.model, 
           data = HolzingerSwineford1939, 
           group = "school",
           group.equal = c("loadings", "intercepts"),
           group.partial = c("visual=~x2", "x7~1"))
}}}

===== invariance =====
{{{
library(semTools)
measurementInvariance(HS.model, data = HolzingerSwineford1939, group = "school")
}}}

==== 예제1 ====
참고: http://r-project.kr/content/r%EB%A1%9C-%ED%95%98%EB%8A%94-%EA%B5%AC%EC%A1%B0%EB%B0%A9%EC%A0%95%EC%8B%9D-lavaan2amos 문서를 보고 함.
{{{
library(lavaan)

> str(ch9.ex1)
'data.frame':	8 obs. of  5 variables:
 $ attitude: int  2 3 3 4 4 4 4 5
 $ loyalty : int  2 3 3 4 4 5 4 5
 $ price   : int  4 4 3 3 2 2 1 1
 $ quality : int  2 3 2 3 3 4 3 5
 $ design  : int  2 3 4 2 5 3 2 4
> ch9.ex1
  attitude loyalty price quality design
1        2       2     4       2      2
2        3       3     4       3      3
3        3       3     3       2      4
4        4       4     3       3      2
5        4       4     2       3      5
6        4       5     2       4      3
7        4       4     1       3      2
8        5       5     1       5      4
> path.model <- "
+     #regressions
+     attitude ~ price + quality + design
+     loyalty ~ attitude
+ 
+     #residual covariances
+     price ~~ quality
+     price ~~ design
+     quality ~~ design
+ "
> path.example <- lavaan(path.model, data=ch9.ex1, auto.var=T, auto.fix.first=T, fixed.x=F)
> summary(path.example)
lavaan (0.5-15) converged normally after  43 iterations

  Number of observations                             8

  Estimator                                         ML
  Minimum Function Test Statistic                1.718
  Degrees of freedom                                 3
  P-value (Chi-square)                           0.633

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)
Regressions:
  attitude ~
    price            -0.382    0.133   -2.869    0.004
    quality           0.459    0.159    2.883    0.004
    design            0.063    0.109    0.579    0.562
  loyalty ~
    attitude          1.064    0.135    7.906    0.000

Covariances:
  price ~~
    quality          -0.688    0.440   -1.563    0.118
    design           -0.313    0.431   -0.725    0.468
  quality ~~
    design            0.234    0.355    0.660    0.509

Variances:
    attitude          0.097    0.048
    loyalty           0.106    0.053
    price             1.250    0.625
    quality           0.859    0.430
    design            1.109    0.555

> summary(path.example, fit.measures=T)
lavaan (0.5-15) converged normally after  43 iterations

  Number of observations                             8

  Estimator                                         ML
  Minimum Function Test Statistic                1.718
  Degrees of freedom                                 3
  P-value (Chi-square)                           0.633

Model test baseline model:

  Minimum Function Test Statistic               40.609
  Degrees of freedom                                10
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.140

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)                -36.520
  Loglikelihood unrestricted model (H1)        -35.662

  Number of free parameters                         12
  Akaike (AIC)                                  97.041
  Bayesian (BIC)                                97.994
  Sample-size adjusted Bayesian (BIC)           62.535

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent Confidence Interval          0.000  0.481
  P-value RMSEA <= 0.05                          0.641

Standardized Root Mean Square Residual:

  SRMR                                           0.021

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)
Regressions:
  attitude ~
    price            -0.382    0.133   -2.869    0.004
    quality           0.459    0.159    2.883    0.004
    design            0.063    0.109    0.579    0.562
  loyalty ~
    attitude          1.064    0.135    7.906    0.000

Covariances:
  price ~~
    quality          -0.688    0.440   -1.563    0.118
    design           -0.313    0.431   -0.725    0.468
  quality ~~
    design            0.234    0.355    0.660    0.509

Variances:
    attitude          0.097    0.048
    loyalty           0.106    0.053
    price             1.250    0.625
    quality           0.859    0.430
    design            1.109    0.555

> summary(path.example, standardized=T)
lavaan (0.5-15) converged normally after  43 iterations

  Number of observations                             8

  Estimator                                         ML
  Minimum Function Test Statistic                1.718
  Degrees of freedom                                 3
  P-value (Chi-square)                           0.633

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)   Std.lv  Std.all
Regressions:
  attitude ~
    price            -0.382    0.133   -2.869    0.004   -0.382   -0.498
    quality           0.459    0.159    2.883    0.004    0.459    0.497
    design            0.063    0.109    0.579    0.562    0.063    0.078
  loyalty ~
    attitude          1.064    0.135    7.906    0.000    1.064    0.942

Covariances:
  price ~~
    quality          -0.688    0.440   -1.563    0.118   -0.688   -0.663
    design           -0.313    0.431   -0.725    0.468   -0.313   -0.265
  quality ~~
    design            0.234    0.355    0.660    0.509    0.234    0.240

Variances:
    attitude          0.097    0.048                      0.097    0.132
    loyalty           0.106    0.053                      0.106    0.113
    price             1.250    0.625                      1.250    1.000
    quality           0.859    0.430                      0.859    1.000
    design            1.109    0.555                      1.109    1.000

> library(qgraph)
Warning message:
패키지 ‘qgraph’는 R 버전 3.0.3에서 작성되었습니다 
> qgraph.lavaan(path.example, layout="spring",
+               vsize.man=8,
+               vsize.lat=8,
+               filetype="",
+               include=4,
+               curve=-0.4,
+               edge.label.cex=0.6)
}}}
attachment:구조방정식/sem.png
 * 브랜드에 대한 태도(attitude)에 영향을 주는 요인은 가격(price), 품질(quality), 외형(design)인데, 
  * 가격은 낮을 수록 좋다. (-0.5)
  * 품질는 좋을 수록 좋다. (0.5)
  * 외형은 별로 관계가 없다. (0.08)
 * 충성도(royalty)는 브랜드에 대한 태도가 영향을 주는 요인이다. (0.94)

==== 예제2 ====
http://www.inside-r.org/packages/cran/qgraph/docs/qgraph.lavaan
{{{
## Not run:
## The industrialization and Political Democracy Example 
# Example from lavaan::sem help file:
require("lavaan")
     ## Bollen (1989), page 332
     model <- ' 
       # latent variable definitions
          ind60 =~ x1 + x2 + x3
          dem60 =~ y1 + y2 + y3 + y4
          dem65 =~ y5 + equal("dem60=~y2")*y6 
                      + equal("dem60=~y3")*y7 
                      + equal("dem60=~y4")*y8
 
       # regressions
         dem60 ~ ind60
         dem65 ~ ind60 + dem60
 
       # residual correlations
         y1 ~~ y5
         y2 ~~ y4 + y6
         y3 ~~ y7
         y4 ~~ y8
         y6 ~~ y8
     '
 
     fit <- sem(model, data=PoliticalDemocracy)
 
# Plot standardized model (numerical):
qgraph.lavaan(fit,layout="tree",vsize.man=5,vsize.lat=10,
    filetype="",include=4,curve=-0.4,edge.label.cex=0.6)
 
# Plot standardized model (graphical):
qgraph.lavaan(fit,layout="tree",vsize.man=5,vsize.lat=10,
    filetype="",include=8,curve=-0.4,edge.label.cex=0.6)
 
# Create output document:
qgraph.lavaan(fit,layout="spring",vsize.man=5,vsize.lat=10,
    filename="lavaan")
 
## End(Not run)
}}}
attachment:구조방정식/sem2.png

==== 참고자료 ====
 * http://sachaepskamp.com/semPlot/examples --> semPlot Example
 * [http://blog.naver.com/PostView.nhn?blogId=gracestock_1&logNo=120200806864&categoryNo=12&parentCategoryNo=0&viewDate=&currentPage=4&postListTopCurrentPage=1&userTopListOpen=true&userTopListCount=5&userTopListManageOpen=false&userTopListCurrentPage=4 구조방정식에 대한 설명, 히든그레이스]
 * http://lavaan.ugent.be/tutorial/index.html
 * https://personality-project.org/r/r.sem.html
 * http://www.r-project.org/conferences/useR-2010/slides/Rosseel.pdf
 * http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-sems.pdf
 * attachment:구조방정식/R구조방정식.zip
 * http://r-project.kr/content/r%EB%A1%9C-%ED%95%98%EB%8A%94-%EA%B5%AC%EC%A1%B0%EB%B0%A9%EC%A0%95%EC%8B%9D-lavaan2amos
 * http://datawaffle.com/f_BigData/4332
 * http://kimcw.khu.ac.kr/contents/bbs/bbs_content.html?bbs_cls_cd=007&cid=13031810035712&bbs_type=B
 * http://www.ktcloudware.com/seminar/down/06.pdf