WEBVTT
WEBVTT
Kind: captions
Language: en
00:00:00.060 --> 00:00:02.850
We will now take a look at estimation of
00:00:02.850 --> 00:00:06.240
factor models and particularly the
confirmatory factor analysis model.
00:00:06.240 --> 00:00:12.450
This is a important to understand because
sometimes your factor analysis results
00:00:12.450 --> 00:00:17.910
indicate that the model doesn't fit the data.
And that is indicated by the chi-square statistic
00:00:17.910 --> 00:00:24.690
then you have to understand what to do And to
understand what to do you have to understand
00:00:24.690 --> 00:00:30.840
what the factor analysis actually does and what
kind of relationships it models in the data.
00:00:30.840 --> 00:00:35.370
So let's take a look at how confirmatory
factor analysis models are estimated.
00:00:35.370 --> 00:00:41.430
The idea in confirmatory factor analysis model
estimation is that you apply tracing rules. So
00:00:41.430 --> 00:00:46.470
this is the same thing that you apply in
mediation models or in regression model
00:00:46.470 --> 00:00:52.050
if you estimate it from a correlation
matrix. We have a factor model here
00:00:52.050 --> 00:01:00.510
and we can specify that the correlations
between a1 and a2 a1 and b1 and a1 with
00:01:00.510 --> 00:01:05.160
itself - which is the variance - are
functions of these moral parameters.
00:01:05.160 --> 00:01:10.350
We use the Phi letter - Greek letter
Phi - for factor correlation that's
00:01:10.350 --> 00:01:15.750
a convention and then we use lamda us for
factor loading. That's also a convention.
00:01:15.750 --> 00:01:18.630
And these all lambdas are different
lambdas. So they have different values.
00:01:18.630 --> 00:01:28.080
So correlation between a1 and a2 is whatever
different paths we can go from a1 to a2. So we
00:01:28.080 --> 00:01:34.410
can go up here and then we go down and that's
one path and there are no other paths from a1
00:01:34.410 --> 00:01:39.840
to a2. So we multiply everything along
the way. So we have one factor loading
00:01:39.840 --> 00:01:47.520
and then we have another factor loading
and that's the lambda a 1 lambda a2 and
00:01:47.520 --> 00:01:51.570
that's the correlation a1 a2 assuming
that these are standardized estimates.
00:01:51.570 --> 00:02:02.640
Then a1 b1 is calculated similarly. The path
is - we take from a1 to a then we take the
00:02:02.640 --> 00:02:12.360
correlation and then we take b to b1. So
that's the correlation with a1 and b1.
00:02:13.330 --> 00:02:20.470
The variation of a1 - we have two different ways
to go somewhere and come back. So we can go to A
00:02:20.470 --> 00:02:28.900
and come back and we're going to go to the error
term E and come back. So that's the variants of a.
00:02:28.900 --> 00:02:37.330
And how we estimate this model again is that
then we calculate a model correlation with all
00:02:37.330 --> 00:02:43.120
indicators and we try to adjust the model so
that the correlations match the observed data.
00:02:43.120 --> 00:02:52.720
Here we have a positive decrease of freedom.
So we are estimating all together 13 different
00:02:52.720 --> 00:02:58.060
things from the data. So we have six factor
loadings. We have six error terms and then
00:02:58.060 --> 00:03:06.670
we have one correlation. So six plus six plus
one is 13 and we have 21 units of information
00:03:06.670 --> 00:03:14.920
because we have 21 unique elements in correlation
matrix of 6 indicators. So we have 6 variances
00:03:14.920 --> 00:03:23.050
and then we have 15 unique correlations. So
these don't count because they're not unique.
00:03:23.050 --> 00:03:29.530
The degree of freedom is 8 which means that
we have a positive decrease of freedom and
00:03:29.530 --> 00:03:35.080
the model is then overestimated. Over
identified. That means that we cannot
00:03:35.080 --> 00:03:42.640
typically solve it exactly. So we cannot find
a set of model implied correlations for these
00:03:44.380 --> 00:03:48.790
correlations so that every correlation
would match the observed correlation.
00:03:48.790 --> 00:03:57.370
So we cannot solve it. We have to just
find a way to quantify the difference
00:03:57.370 --> 00:04:02.350
between the implied correlation and that
observed correlation We could take a sum
00:04:02.350 --> 00:04:08.050
of squares which would be the unweighted least
squares estimator. Typically we take a weighted
00:04:08.050 --> 00:04:14.140
sum of these implied correlations minus the
observed correlations and a particular set of
00:04:14.140 --> 00:04:20.680
weights produces the maximum likelihood
estimator for this particular model.
00:04:20.680 --> 00:04:30.220
So the idea is that we find the model parameters
so that the implied correlations are as close
00:04:30.220 --> 00:04:36.040
to the object correlations as possible.
To do that there are some other things
00:04:36.040 --> 00:04:42.220
that we need to consider before you can
actually estimate the model. That relates
00:04:42.220 --> 00:04:46.690
to identification and scale setting
that I'll describe in the next video.