WEBVTT
WEBVTT
Kind: captions
Language: en

00:00:00.030 --> 00:00:04.590
Confirmatory factor analysis differs&nbsp;
from exploratory factor analysis in&nbsp;&nbsp;

00:00:04.590 --> 00:00:09.210
that in a confirmatory factor analysis&nbsp;
the researcher specifies the factors for&nbsp;&nbsp;

00:00:09.210 --> 00:00:14.370
the data instead of having the computer&nbsp;
discover what are the underlying factors.

00:00:14.370 --> 00:00:20.820
So confirmatory factor analysis requires that you&nbsp;
specify the expected result and then computer will&nbsp;&nbsp;

00:00:20.820 --> 00:00:25.500
tell you if that result fits your data and&nbsp;
will estimate the factor loadings for you.

00:00:25.500 --> 00:00:29.970
This is a more flexible approach to factor&nbsp;
analysis than an exploratory analysis.

00:00:29.970 --> 00:00:35.580
In this video I will demonstrate you on a&nbsp;
conceptual level what confirmatory factor&nbsp;&nbsp;

00:00:35.580 --> 00:00:39.780
analysis does and how it can be applied&nbsp;
to more different kind of scenarios.

00:00:39.780 --> 00:00:50.190
Our data has six indicators. We have indicators a1&nbsp;
through a3 that are supposed to measure construct&nbsp;&nbsp;

00:00:50.190 --> 00:00:54.540
A so we can see that there is variation&nbsp;
in these indicators due to construct A&nbsp;&nbsp;

00:00:54.540 --> 00:00:59.580
and there is also some random noise the E&nbsp;
here though that's unreliability of these&nbsp;&nbsp;

00:00:59.580 --> 00:01:05.040
indicators and then we have some variance&nbsp;
components that is reliable. So for example&nbsp;&nbsp;

00:01:05.040 --> 00:01:11.850
if we measure a3 multiple times there is&nbsp;
a specific part of the a3 that is reliable&nbsp;&nbsp;

00:01:11.850 --> 00:01:18.750
but it is specific to a3. So for example if we&nbsp;
ask whether company is innovative or not that&nbsp;&nbsp;

00:01:18.750 --> 00:01:24.930
measures innovativeness it can also measure&nbsp;
something else. So the unreliability is not&nbsp;&nbsp;

00:01:24.930 --> 00:01:28.230
the only source of measurement error&nbsp;
but there's also some item uniqueness.

00:01:28.230 --> 00:01:34.680
And the idea of a confirmatory factor&nbsp;
analysis is that we specify factor model&nbsp;&nbsp;

00:01:34.680 --> 00:01:40.110
for this day ourselves. So for example&nbsp;
we would say that because these three&nbsp;&nbsp;

00:01:40.110 --> 00:01:46.680
indicators of a1 through a3 are supposed&nbsp;
to measure construct A - we assign them&nbsp;&nbsp;

00:01:46.680 --> 00:01:53.730
to factor A and then we assign these to factor&nbsp;
B and then each indicators gets an error term.

00:01:53.730 --> 00:02:00.480
Then the factor analysis takes the variance&nbsp;
of those indicators apart into variants that&nbsp;&nbsp;

00:02:00.480 --> 00:02:06.120
can be attributed to the factors and variance&nbsp;
that can be attributed error terms. Like so.

00:02:06.120 --> 00:02:12.330
So now we have a factor solution here. We have&nbsp;
all variation that is to due to the concept A&nbsp;&nbsp;

00:02:12.330 --> 00:02:18.060
goes to factor A all variation that is due to&nbsp;
the concept B goes to factor B and all these&nbsp;&nbsp;

00:02:18.060 --> 00:02:23.880
item uniqueness and unreliability goes to the&nbsp;
error terms that are assumed to be uncorrelated.

00:02:23.880 --> 00:02:30.240
So these uncorrelated distinct sources of&nbsp;
variation for its indicator and then we have&nbsp;&nbsp;

00:02:30.240 --> 00:02:32.790
the two common factors. So that's the ideal case.

00:02:32.790 --> 00:02:38.820
Sometimes your data are not as&nbsp;
as great as you would like and&nbsp;&nbsp;

00:02:38.820 --> 00:02:44.040
confirmatory factor analysis allows you to&nbsp;
model also problems in your measurement.

00:02:44.040 --> 00:02:51.750
So for example if we have this kind of scenario.&nbsp;
There is a variation again due to construct A&nbsp;&nbsp;

00:02:51.750 --> 00:02:59.190
variation to the construct B and then we have&nbsp;
unreliability - the black circles here - and&nbsp;&nbsp;

00:02:59.190 --> 00:03:06.210
we have unique aspects of each indicator but&nbsp;
there is also some variation in a3 and and&nbsp;&nbsp;

00:03:06.210 --> 00:03:12.570
b1 that correlates. The variance component&nbsp;
are here. These letters have no particular&nbsp;&nbsp;

00:03:12.570 --> 00:03:16.920
meaning by the way. They're just letters&nbsp;
to distinguish these are different circles.

00:03:16.920 --> 00:03:26.910
So a3 and b1 correlate for some other reason&nbsp;
than measuring A and measuring B that could&nbsp;&nbsp;

00:03:26.910 --> 00:03:34.740
possibly be correlated. If we fit a confirmatory&nbsp;
factor analysis model then this variation that&nbsp;&nbsp;

00:03:34.740 --> 00:03:41.670
is shared by a1 a3 and b1 actually goes to the&nbsp;
factors. The reason why it goes the factors is&nbsp;&nbsp;

00:03:41.670 --> 00:03:48.840
that these error terms are constrained to&nbsp;
be uncorrelated. And what happens now is&nbsp;&nbsp;

00:03:48.840 --> 00:03:55.320
that our factors - which are supposed to present&nbsp;
construct A and construct B - are contaminated by&nbsp;&nbsp;

00:03:55.320 --> 00:04:02.790
this secondary source of variation A and that&nbsp;
does present in a3 and b1 and as a consequence&nbsp;&nbsp;

00:04:02.790 --> 00:04:08.700
the correlation between factors A and B will be&nbsp;
overestimated and your results will be biased.

00:04:08.700 --> 00:04:16.230
And this is also the case in exploratory&nbsp;
factor analysis. So if you have two factors&nbsp;&nbsp;

00:04:16.230 --> 00:04:22.800
that influence or if we have this kind of like&nbsp;
minor factor the influence is a3 and b1 and we&nbsp;&nbsp;

00:04:22.800 --> 00:04:28.620
only get two factors then the factor correlation&nbsp;
between those two factors will be inflated. If&nbsp;&nbsp;

00:04:28.620 --> 00:04:34.680
we were to run exploratory analysis then the&nbsp;
exploratory analysis could identify that there&nbsp;&nbsp;

00:04:34.680 --> 00:04:40.740
is a third factor that loads on all a3 and&nbsp;
b1 but because it's just two indicators it's&nbsp;&nbsp;

00:04:40.740 --> 00:04:45.720
also possible that the exploratory analysis&nbsp;
wouldn't identify there that factor for us.

00:04:45.720 --> 00:04:52.230
So what can we do with this kind of situation?&nbsp;
Confirmatory analysis allows us to also model&nbsp;&nbsp;

00:04:52.230 --> 00:04:59.550
correlated error. So instead of specifying&nbsp;
that these error terms of a3 and b1 are&nbsp;&nbsp;

00:04:59.550 --> 00:05:05.640
uncorrelated we can say that it's possible&nbsp;
that a3 and b1 correlate for some other&nbsp;&nbsp;

00:05:05.640 --> 00:05:10.770
reason. So we relax the constraint. We&nbsp;
specify that these two can be correlated&nbsp;&nbsp;

00:05:10.770 --> 00:05:19.200
and then the variation in a3 and b1 that&nbsp;
is shared between these indicators but&nbsp;&nbsp;

00:05:19.200 --> 00:05:25.710
not with others - so it's not part of&nbsp;
the factors - then gets to escape these&nbsp;&nbsp;

00:05:25.710 --> 00:05:31.290
error terms and then we also again get clean&nbsp;
estimate of the factor correlation A and B.

00:05:31.290 --> 00:05:39.090
But this is something that many people do. So your&nbsp;
statistical software will tell you that the model&nbsp;&nbsp;

00:05:39.090 --> 00:05:44.280
doesn't fit the data perfectly and it'll also&nbsp;
tell you that you could freeze some correlations&nbsp;&nbsp;

00:05:44.280 --> 00:05:50.190
to make the model fit better but that's a bit&nbsp;
dangerous unless you know what you're doing.

00:05:50.190 --> 00:05:58.200
You should only add this kind of correlated errors&nbsp;
if you have a good theoretical reason to do so. So&nbsp;&nbsp;

00:05:58.200 --> 00:06:02.670
the fact that your statistical software tells you&nbsp;
to do is that you could do something to increase&nbsp;&nbsp;

00:06:02.670 --> 00:06:11.280
the model fit is another reason to do something&nbsp;
It's an indication that you could do something and&nbsp;&nbsp;

00:06:11.280 --> 00:06:17.670
you should consider something. It's not a definite&nbsp;
guideline that you should actually do that.

00:06:17.670 --> 00:06:25.260
So under which scenario then are you - is&nbsp;
it a good idea to allow the terms of two&nbsp;&nbsp;

00:06:25.260 --> 00:06:31.080
indicators to correlate? Dor example if our&nbsp;
indicators would look like that. So we would&nbsp;&nbsp;

00:06:31.080 --> 00:06:37.035
have indicators about innovativeness. So A here&nbsp;
- A factor here's innovativeness - and B factory&nbsp;&nbsp;

00:06:37.035 --> 00:06:42.630
is productivity. So we would have questions about&nbsp;
innovativeness and questions about productivity.

00:06:42.630 --> 00:06:51.390
Then we realized that okay so a3 is&nbsp;
our personal is innovative and b1 is&nbsp;&nbsp;

00:06:51.390 --> 00:06:56.040
our personal is productive. So both&nbsp;
of these actually have these this&nbsp;&nbsp;

00:06:56.040 --> 00:07:01.020
personnel dimension as well. So they&nbsp;
don't measure only innovativeness and&nbsp;&nbsp;

00:07:01.020 --> 00:07:06.600
productivity they also measure how high&nbsp;
quality the personnel in the company are.

00:07:06.600 --> 00:07:11.070
So then we realize that okay so there&nbsp;
is a secondary dimension that these two&nbsp;&nbsp;

00:07:11.070 --> 00:07:16.230
indicators measure and then we can add the&nbsp;
error correlation here. But also you have&nbsp;&nbsp;

00:07:16.230 --> 00:07:22.470
to justify it. So it's not enough that you say&nbsp;
that statistical software tells us that the more&nbsp;&nbsp;

00:07:22.470 --> 00:07:27.720
of it fits better if we do something you have&nbsp;
to justify it also in non statistical terms.

00:07:27.720 --> 00:07:32.730
This is the same thing like with the outliers&nbsp;
- you don't delete an observation because it&nbsp;&nbsp;

00:07:32.730 --> 00:07:38.100
is different you have to explain why it's&nbsp;
different in non statistical terms. The same&nbsp;&nbsp;

00:07:38.100 --> 00:07:45.090
thing when you eliminate indicators from a scale.&nbsp;
So your statistical software will tell you that&nbsp;&nbsp;

00:07:45.090 --> 00:07:51.990
sometimes eliminating an item from a scale will&nbsp;
make cronbach's alpha to go up but that's not&nbsp;&nbsp;

00:07:51.990 --> 00:07:57.660
a reason to eliminate an item. You should also&nbsp;
look at non statistical criteria. So what does&nbsp;&nbsp;

00:07:57.660 --> 00:08:03.990
the item look like is there a good reason why&nbsp;
we think it's less reliable. Because this kind&nbsp;&nbsp;

00:08:03.990 --> 00:08:10.530
of suggestions by your software they could also&nbsp;
be just a random correlation between two random&nbsp;&nbsp;

00:08:10.530 --> 00:08:16.470
elements - so random correlation between these E's&nbsp;
and then you would be missed specifying the model.

00:08:16.470 --> 00:08:23.640
Another way - perhaps a bit better way -&nbsp;
to accomplish the same is to specify this&nbsp;&nbsp;

00:08:23.640 --> 00:08:30.060
secondary factor. So we could instead saying that&nbsp;
these two errors are uncorrelated we could say&nbsp;&nbsp;

00:08:30.060 --> 00:08:36.690
that these indicators a3 and b1 actually are&nbsp;
also measuring something else. So we add this&nbsp;&nbsp;

00:08:36.690 --> 00:08:44.160
secondary factor here and this is a bit more&nbsp;
appealing approach because then it makes - you&nbsp;&nbsp;

00:08:44.160 --> 00:08:52.020
have to explicitly then interpret what this&nbsp;
factor means and it's a lot easier to free&nbsp;&nbsp;

00:08:52.020 --> 00:08:57.360
correlations without explaining what they are&nbsp;
actually what's the interpretation of these two&nbsp;&nbsp;

00:08:57.360 --> 00:09:02.670
to of the correlation between these error terms.&nbsp;
It's lot easier to do that without an explanation&nbsp;&nbsp;

00:09:02.670 --> 00:09:08.970
than adding a factor. So if you add a factor then&nbsp;
your reviewers will ask you to explain it and&nbsp;&nbsp;

00:09:08.970 --> 00:09:13.710
you always should so it's a good idea to have&nbsp;
the factor instead of having the correlation.

00:09:13.710 --> 00:09:21.780
Mathematically both of these accomplish the&nbsp;
exact same thing. They allow the unique aspect&nbsp;&nbsp;

00:09:21.780 --> 00:09:26.700
of a3 and b1 that is correlated&nbsp;
to escape from the error terms.

00:09:26.700 --> 00:09:35.100
This example of adding the minor factors can be&nbsp;&nbsp;

00:09:35.100 --> 00:09:40.980
extended to also another scenario. So&nbsp;
that's just the same indicators again.

00:09:40.980 --> 00:09:51.720
We can have this kind of scenario. So what's&nbsp;
the scenario here. We have indicators a1&nbsp;&nbsp;

00:09:51.720 --> 00:09:58.080
through a3 measure A. Indicators b1 to b3&nbsp;
measure B. Then there's unreliability the&nbsp;&nbsp;

00:09:58.080 --> 00:10:02.460
ease and then there is some variation&nbsp;
that is shared by all the indicators.

00:10:02.460 --> 00:10:11.130
That variation could be for example variation due&nbsp;
to the measurement method. So this is a scenario&nbsp;&nbsp;

00:10:11.130 --> 00:10:16.200
where you would have common method variance. So&nbsp;
the R would be here - the variation due to the&nbsp;&nbsp;

00:10:16.200 --> 00:10:23.100
method or the common method variance - and then&nbsp;
if we estimate the factor model with a1 a2 a3&nbsp;&nbsp;

00:10:23.100 --> 00:10:29.130
loading on A and these b's keeps loading on B&nbsp;
- then all variation during the method escapes&nbsp;&nbsp;

00:10:29.130 --> 00:10:34.680
to this B factor and A factor and the factor&nbsp;
correlation will be overestimated greatly.

00:10:34.680 --> 00:10:40.290
So in this kind of scenario it is&nbsp;
possible to also specify a secondary&nbsp;&nbsp;

00:10:40.290 --> 00:10:47.370
factor. So we can specify this method&nbsp;
factor here and the idea is that all&nbsp;&nbsp;

00:10:47.370 --> 00:10:51.810
the indicators load on the factors&nbsp;
they're supposed to measure - the&nbsp;&nbsp;

00:10:51.810 --> 00:10:57.540
factors representing the constructs - and a&nbsp;
factor representing the measurement process.

00:10:57.540 --> 00:11:07.050
So looks really good. Looks good too good to be&nbsp;
true. This is not a panacea for method variance&nbsp;&nbsp;

00:11:07.050 --> 00:11:12.730
problems. There are... This kind of model is&nbsp;
problematic for the estimate. The reason for&nbsp;&nbsp;

00:11:12.730 --> 00:11:22.540
that is that a high correlation between A&nbsp;
and B is nearly indistinguishable from a1&nbsp;&nbsp;

00:11:22.540 --> 00:11:29.230
a2 a3 b1 b2 b3 just being caused by one&nbsp;
factor. So they are empirically nearly&nbsp;&nbsp;

00:11:29.230 --> 00:11:34.210
impossible to distinguish so this kind&nbsp;
of model is very unstable to estimate.

00:11:34.210 --> 00:11:41.290
In practice these models have been shown&nbsp;
to be problematic even with simulated&nbsp;&nbsp;

00:11:41.290 --> 00:11:48.040
datasets but there's one way that this kind&nbsp;
of model can work and it's if you add these&nbsp;&nbsp;

00:11:48.040 --> 00:11:52.390
marker indicators. So sometimes you see&nbsp;
in published papers that they use marker&nbsp;&nbsp;

00:11:52.390 --> 00:11:58.300
indicators. The idea of marker indicators is&nbsp;
that you have indicators that are unrelated&nbsp;&nbsp;

00:11:58.300 --> 00:12:05.410
with the factors that you're modeling so&nbsp;
a1 and b1 are unrelated to these m1 m2.

00:12:05.410 --> 00:12:12.880
For example if you use innovativeness and&nbsp;
productivity and then you have questions&nbsp;&nbsp;

00:12:12.880 --> 00:12:17.110
on one to seven scale - you could&nbsp;
have a marker indicator of whether&nbsp;&nbsp;

00:12:17.110 --> 00:12:20.650
the person likes jazz music or not.&nbsp;
I've actually seen that being used.

00:12:20.650 --> 00:12:27.970
The idea is that how much you like jazz&nbsp;
music is completely unrelated with the&nbsp;&nbsp;

00:12:27.970 --> 00:12:34.690
innovativeness and productivity of your company&nbsp;
but if the jazz music indicator correlates with&nbsp;&nbsp;

00:12:34.690 --> 00:12:41.650
these indicators then we can assume that that&nbsp;
correlation is purely due to the measurement&nbsp;&nbsp;

00:12:41.650 --> 00:12:48.850
method because jazz music liking and innovation&nbsp;
really are two completely different things.