WEBVTT WEBVTT Kind: captions Language: en 00:00:00.060 --> 00:00:02.850 We will now take a look at estimation of   00:00:02.850 --> 00:00:06.240 factor models and particularly the  confirmatory factor analysis model. 00:00:06.240 --> 00:00:12.450 This is a important to understand because  sometimes your factor analysis results   00:00:12.450 --> 00:00:17.910 indicate that the model doesn't fit the data.  And that is indicated by the chi-square statistic   00:00:17.910 --> 00:00:24.690 then you have to understand what to do And to  understand what to do you have to understand   00:00:24.690 --> 00:00:30.840 what the factor analysis actually does and what  kind of relationships it models in the data. 00:00:30.840 --> 00:00:35.370 So let's take a look at how confirmatory  factor analysis models are estimated. 00:00:35.370 --> 00:00:41.430 The idea in confirmatory factor analysis model  estimation is that you apply tracing rules. So   00:00:41.430 --> 00:00:46.470 this is the same thing that you apply in  mediation models or in regression model   00:00:46.470 --> 00:00:52.050 if you estimate it from a correlation  matrix. We have a factor model here   00:00:52.050 --> 00:01:00.510 and we can specify that the correlations  between a1 and a2 a1 and b1 and a1 with   00:01:00.510 --> 00:01:05.160 itself - which is the variance - are  functions of these moral parameters. 00:01:05.160 --> 00:01:10.350 We use the Phi letter - Greek letter  Phi - for factor correlation that's   00:01:10.350 --> 00:01:15.750 a convention and then we use lamda us for  factor loading. That's also a convention.   00:01:15.750 --> 00:01:18.630 And these all lambdas are different  lambdas. So they have different values. 00:01:18.630 --> 00:01:28.080 So correlation between a1 and a2 is whatever  different paths we can go from a1 to a2. So we   00:01:28.080 --> 00:01:34.410 can go up here and then we go down and that's  one path and there are no other paths from a1   00:01:34.410 --> 00:01:39.840 to a2. So we multiply everything along  the way. So we have one factor loading   00:01:39.840 --> 00:01:47.520 and then we have another factor loading  and that's the lambda a 1 lambda a2 and   00:01:47.520 --> 00:01:51.570 that's the correlation a1 a2 assuming  that these are standardized estimates. 00:01:51.570 --> 00:02:02.640 Then a1 b1 is calculated similarly. The path  is - we take from a1 to a then we take the   00:02:02.640 --> 00:02:12.360 correlation and then we take b to b1. So  that's the correlation with a1 and b1. 00:02:13.330 --> 00:02:20.470 The variation of a1 - we have two different ways  to go somewhere and come back. So we can go to A   00:02:20.470 --> 00:02:28.900 and come back and we're going to go to the error  term E and come back. So that's the variants of a. 00:02:28.900 --> 00:02:37.330 And how we estimate this model again is that  then we calculate a model correlation with all   00:02:37.330 --> 00:02:43.120 indicators and we try to adjust the model so  that the correlations match the observed data. 00:02:43.120 --> 00:02:52.720 Here we have a positive decrease of freedom.  So we are estimating all together 13 different   00:02:52.720 --> 00:02:58.060 things from the data. So we have six factor  loadings. We have six error terms and then   00:02:58.060 --> 00:03:06.670 we have one correlation. So six plus six plus  one is 13 and we have 21 units of information   00:03:06.670 --> 00:03:14.920 because we have 21 unique elements in correlation  matrix of 6 indicators. So we have 6 variances   00:03:14.920 --> 00:03:23.050 and then we have 15 unique correlations. So  these don't count because they're not unique. 00:03:23.050 --> 00:03:29.530 The degree of freedom is 8 which means that  we have a positive decrease of freedom and   00:03:29.530 --> 00:03:35.080 the model is then overestimated. Over  identified. That means that we cannot   00:03:35.080 --> 00:03:42.640 typically solve it exactly. So we cannot find  a set of model implied correlations for these   00:03:44.380 --> 00:03:48.790 correlations so that every correlation  would match the observed correlation. 00:03:48.790 --> 00:03:57.370 So we cannot solve it. We have to just  find a way to quantify the difference   00:03:57.370 --> 00:04:02.350 between the implied correlation and that  observed correlation We could take a sum   00:04:02.350 --> 00:04:08.050 of squares which would be the unweighted least  squares estimator. Typically we take a weighted   00:04:08.050 --> 00:04:14.140 sum of these implied correlations minus the  observed correlations and a particular set of   00:04:14.140 --> 00:04:20.680 weights produces the maximum likelihood  estimator for this particular model. 00:04:20.680 --> 00:04:30.220 So the idea is that we find the model parameters  so that the implied correlations are as close   00:04:30.220 --> 00:04:36.040 to the object correlations as possible.  To do that there are some other things   00:04:36.040 --> 00:04:42.220 that we need to consider before you can  actually estimate the model. That relates   00:04:42.220 --> 00:04:46.690 to identification and scale setting  that I'll describe in the next video.