WEBVTT
WEBVTT
Kind: captions
Language: en
00:00:00.060 --> 00:00:05.520
Let's take a look at an empirical example of
exploratory factor analysis. To do that we need
00:00:05.520 --> 00:00:10.980
some data and our data comes from a research
paper by Mesquita and Lazzarini from 2008.
00:00:10.980 --> 00:00:16.290
This is an interesting paper because the
authors present the full correlation matrix
00:00:16.290 --> 00:00:21.990
of all the indicators in the paper. That
means that we can replicate everything
00:00:21.990 --> 00:00:26.730
that authors do using the correlation matrix
and we also get the same result for all the
00:00:26.730 --> 00:00:31.440
analysis. So this is completely transparent
paper that we can replicate ourselves.
00:00:31.440 --> 00:00:37.170
This article uses converter factor analysis and
structural regression models but we can equally
00:00:37.170 --> 00:00:42.570
well do an exploratory factor analysis to see
if we get the same result as the authors did.
00:00:42.570 --> 00:00:50.910
So this is the data set that we have. And it's
the table one descriptive statistics correlations
00:00:50.910 --> 00:00:57.420
except instead of on a scale level it is on the
indicator level. We will be using all questions
00:00:57.420 --> 00:01:03.630
that are measured on the one to seven scale
to eliminate any scaling issues from the data.
00:01:03.630 --> 00:01:11.400
So we'll have five scales. These five here
and the indicators are three indicators for
00:01:11.400 --> 00:01:16.050
horizontal governance three indicators
of vertical governance three indicators
00:01:16.050 --> 00:01:22.050
of collective sourcing two indicators for export
orientation and three indicators for investment.
00:01:22.050 --> 00:01:26.700
Whether these indicators measure what
the authors claim they do measure is a
00:01:26.700 --> 00:01:31.050
question that we will not address
in this video. We'll just take a
00:01:31.050 --> 00:01:36.420
look at whether for example these export
orientation indicators can be argued to
00:01:36.420 --> 00:01:40.680
measure something together that is
distinct from the other indicators.
00:01:40.680 --> 00:01:48.630
So we have 14 variables and we want to assess
whether they measure five distinct things.
00:01:48.630 --> 00:01:55.440
In an exploratory factor analysis - when we start
the analysis we have to define how many factors
00:01:55.440 --> 00:02:04.470
we extract. So one way to do that decision is
to use a tool called scree plot. So the idea of
00:02:04.470 --> 00:02:11.880
a scree plot is that we extract components
from the data and then we have a variable
00:02:11.880 --> 00:02:16.290
here that quantifies how many variables
for the variants each component explains.
00:02:16.290 --> 00:02:23.160
Some rules of thumb on how to choose the
number of factors is that we can either
00:02:23.160 --> 00:02:29.670
choose 5 factors based on a pivot point.
So a clear pivot point when the curve
00:02:29.670 --> 00:02:34.230
starts to go flat means that that's the
number of factors that we should extract.
00:02:34.230 --> 00:02:41.370
Another rule of thumb is that we go as long
as we get these eigen values more than one
00:02:41.370 --> 00:02:49.080
which would be 4 factors. But here we know
that this set of indicators is supposed to
00:02:49.080 --> 00:02:53.940
measure 5 distinct things so we can use the
best rule of thumb which is our theory and
00:02:53.940 --> 00:03:00.270
theory states that we take 5 factors. Because we
have 5 different things that we want to measure.
00:03:00.270 --> 00:03:06.810
So we apply factor analysis. We request
5 factors using these 14 indicators.
00:03:06.810 --> 00:03:11.310
We get the result printout. So
what does the printout tell us?
00:03:11.310 --> 00:03:15.900
There are different sections. There are 3
different sections. The first section is the
00:03:15.900 --> 00:03:22.440
factor loadings. So these statistics tell
how strongly the indicators are related to
00:03:22.440 --> 00:03:27.990
each factor and how much uniqueness there is in
the indicators that the factors don't explain.
00:03:27.990 --> 00:03:35.250
The second section is the variance explained
how much each factor explains the variation
00:03:35.250 --> 00:03:40.770
and then finally in the table or in the bottom
section we have different model quality indices.
00:03:40.770 --> 00:03:47.190
I don't typically myself interpret this
model quality indices because if I want
00:03:47.190 --> 00:03:52.140
to really know if the model fits the data
well or not I will do it with the converter
00:03:52.140 --> 00:03:58.440
factor analysis based techniques which have
a lot more diagnostics options available.
00:03:58.440 --> 00:04:04.110
So in practice we interpret the factor loading
pattern - how strong the individual loadings are
00:04:04.110 --> 00:04:09.930
and how much variations the factors explained
if you want to do more diagnostics then it's
00:04:09.930 --> 00:04:13.710
better to move into the confirmatory
factor analysis family of techniques.
00:04:13.710 --> 00:04:19.350
So the factor loadings here provide
us some information. They provide us
00:04:19.350 --> 00:04:23.940
information about how strongly each
indicator is related to its factor.
00:04:23.940 --> 00:04:29.460
The factor loadings are regressions of
items of factors. So it's regression
00:04:29.460 --> 00:04:33.390
path it's the directional path
because this is a standardized
00:04:33.390 --> 00:04:37.530
factor analysis solutions and the
factors are uncorrelated in this
00:04:37.530 --> 00:04:44.400
factor solution which they are by default -
then the loadings also equal correlations.
00:04:44.400 --> 00:04:54.390
So this last item correlated 0.75 with the second
factor. Then we have also the uniqueness here or
00:04:54.390 --> 00:05:02.220
the commonality dates square which tells how much
of the variation of the indicator all the factors
00:05:02.220 --> 00:05:09.210
explained together and uniqueness how much of the
variance of the indicator remains unexplained.
00:05:09.210 --> 00:05:16.800
Sometimes the uniqueness is interpreted as
evidence or a measure of unreliability. So
00:05:16.800 --> 00:05:23.550
uniqueness is 30% we say that the indicators
error variance is 30%. 70% is the reliable
00:05:23.550 --> 00:05:29.250
variance. The problem with that approach
is that the uniqueness also captures other
00:05:29.250 --> 00:05:33.690
sources of unique variation that is not
random noise. So for example there's
00:05:33.690 --> 00:05:41.130
probably something unique in total quality
management item that is not related to other
00:05:41.130 --> 00:05:46.380
investment items that would be reliable
if we ever asked the same question again.
00:05:46.380 --> 00:05:53.910
So factor analysis puts the
unreliability variance the
00:05:53.910 --> 00:05:58.020
random error and the unique various into
one same number and there is really no
00:05:58.020 --> 00:06:02.370
way of taking them apart. So that's
one weakness of a factor analysis.
00:06:03.810 --> 00:06:09.750
The variance explained here shows that the first
factor explains most of the variation but this
00:06:09.750 --> 00:06:15.720
is an unrotated solution so we don't really pay
much attention to this except for one thing. So
00:06:15.720 --> 00:06:24.450
we can do a Harman's single factor test which
you sometimes see reported in papers and the
00:06:24.450 --> 00:06:32.850
Harman's test involves checking whether
the first factor explains a majority of
00:06:32.850 --> 00:06:36.630
the data - of the variance in the data - and
whether it dominates over the other factors.
00:06:36.630 --> 00:06:41.460
So we can see here the first factor is 25
percent the second factor is 16 percent.
00:06:41.460 --> 00:06:45.840
We can't say that the first factor would
explain most of the data. We can't say that
00:06:45.840 --> 00:06:51.000
it will dominate over the other factors because
25 and 16 percent are still in the same bullpark.
00:06:51.000 --> 00:06:57.630
The Harman's single factor test is a
bit misleading in this name because
00:06:57.630 --> 00:07:01.770
it's not really a statistical test and
it's not even a very good diagnostic
00:07:01.770 --> 00:07:09.840
because it will probably detect only
very severe method variance problems.
00:07:09.840 --> 00:07:16.050
Nevertheless it's something that you can easily
check from the results of exploratory factor
00:07:16.050 --> 00:07:22.410
analysis if you want to do more rigorous tests of
method variance then you can apply confectionery
00:07:22.410 --> 00:07:27.750
factor analysis based techniques that allow you
much more degrees of freedom on what you can do.
00:07:27.750 --> 00:07:33.330
Let's take a look at the factor loadings. The
idea of factor loadings is that they should
00:07:33.330 --> 00:07:39.660
show a pattern. So we should see that the
indicators that are supposed to measure the
00:07:39.660 --> 00:07:44.430
first three indicators - they're supposed to
measure one thing - should load on one factor
00:07:44.430 --> 00:07:49.740
and one factor only and then the measures
of the other construct should not load on
00:07:49.740 --> 00:07:54.630
that factor. So it's not the case here and the
reason why it's not the case is that this is
00:07:54.630 --> 00:08:00.210
an unrotated factor solution. So typically in
a factor analysis when we extract the factors
00:08:00.210 --> 00:08:05.670
we take the first factor that explains the
majority of the data and if the constructs
00:08:05.670 --> 00:08:11.640
that cause the data are correlated then the
first factor contains a little bit of every
00:08:11.640 --> 00:08:17.460
construct. So it's all indicators load on
it highly and we can't really interpret it.
00:08:17.460 --> 00:08:24.030
So we do a factor rotation and factor rotation
simplifies the factor announce results. It also
00:08:24.030 --> 00:08:32.760
makes another nice - has another nice feature.
Factor rotation can relax the constraint that
00:08:32.760 --> 00:08:41.640
all the factors are uncorrelated when we do the
factor analysis. The zero correlated constraint
00:08:41.640 --> 00:08:47.730
is something it's a technical reason why we
have it and it doesn't make any theoretical
00:08:47.730 --> 00:08:54.360
sense if we are studying constructs that we think
are related. So if we think that the constructs
00:08:54.360 --> 00:08:59.550
are related causslly or otherwise we cannot
assume that the constructs are uncorrelated.
00:08:59.550 --> 00:09:03.900
Therefore imposing constraint the two factors that
00:09:03.900 --> 00:09:08.100
are supposed to represent those constructs
are uncorrelated doesn't make any sense.
00:09:08.100 --> 00:09:14.250
That's another reason why we rotate the
factors which are relaxed in that constraint.
00:09:14.250 --> 00:09:20.010
The factor rotation simplifies the
result and after rotation we can
00:09:20.010 --> 00:09:24.900
see that the first three indicators go
to one factor the second three another
00:09:24.900 --> 00:09:29.730
factor. So we have a nice pattern
that each indicator - its group
00:09:29.730 --> 00:09:33.480
of indicators loads from one factor
only and there are no cross loading.
00:09:33.480 --> 00:09:40.260
So this would be evidence that these indicators
- for example these three indicators - measure
00:09:40.260 --> 00:09:46.980
the same thing together and it is distinct
from what these other indicators may measure.
00:09:46.980 --> 00:09:50.610
So you want to have this kind of
pattern and it is indication of
00:09:50.610 --> 00:09:56.340
validity. Of course it doesn't guarantee
validity because it doesn't tell us what
00:09:56.340 --> 00:10:01.020
these indicators have in common but it's
some kind of indirect evidence that there
00:10:01.020 --> 00:10:06.420
could be one construct underlying driving
the correlations between these indicators.
00:10:06.420 --> 00:10:14.580
Another thing that we look at from these factor
loadings is their magnitude. So that's what we do
00:10:14.580 --> 00:10:21.840
when we assess the results. And this is an example
from Yli-Renko's article. They have a table of
00:10:21.840 --> 00:10:27.930
factor loadings. They have the measurement
items. They have labeled the factors. So
00:10:27.930 --> 00:10:33.900
usually you label the factors with the constructs
names and then then you look at the loading.
00:10:33.900 --> 00:10:41.400
So the factor loadings here are interpreted as
evidence of reliability. So the square of factor
00:10:41.400 --> 00:10:47.220
loading is an estimate of the reliability of the
indicator and then we also have these statistics
00:10:47.220 --> 00:10:53.880
- z-statistic that is used for testing the
significance whether the loading is zero or not.
00:10:53.880 --> 00:11:01.350
I don't think the null hypothesis is that loading
is zero is very relevant. So you want to really
00:11:01.350 --> 00:11:09.120
know whether the indicators are reliable enough -
not whether their reliability differs from zero.
00:11:09.120 --> 00:11:15.240
So this is not a very useful test but
people still sometimes presented it. The
00:11:15.240 --> 00:11:19.320
first indicator here is not tested.
The reason for this is that this is
00:11:19.320 --> 00:11:22.500
from a converter factor analysis
and there's a technical reason
00:11:22.500 --> 00:11:28.350
why the first indicator is not not tested
here. I'll explain that in another video.
00:11:28.350 --> 00:11:35.940
Then the authors say that the standardized
loadings are all about 0.57 and the cutoff
00:11:35.940 --> 00:11:42.270
is 0.4. The commonly used cutoff is 0.7
but you can probably find somebody who has
00:11:42.270 --> 00:11:48.090
presented a lower cutoff if you do that kind of
cherry picking but really normally we want the
00:11:48.090 --> 00:11:54.690
loading to be 0.47 but reliability again is
a matter of degree it's not a matter of yes
00:11:54.690 --> 00:12:00.750
or no and you have to then assess what the
unreliability means for your study results.