WEBVTT
WEBVTT
Kind: captions
Language: en
00:00:00.060 --> 00:00:03.810
Factor analysis answers the question
of what indicators have in common.
00:00:03.810 --> 00:00:10.410
In an exploratory factor analysis you start by
first extracting all common variants from the
00:00:10.410 --> 00:00:17.340
data and then you take that out from the data
- then you extract another factor and so on.
00:00:17.340 --> 00:00:23.400
This process leads the factor analysis
solution that is often uninterpretable
00:00:23.400 --> 00:00:28.650
because most of the indicators will load
highly on the first factor and then have
00:00:28.650 --> 00:00:32.070
a mixture of positive and negative
loadings on the remaining factors.
00:00:32.070 --> 00:00:38.010
To make the factor analysis results more
interpretable we do a factor rotation.
00:00:38.010 --> 00:00:45.270
What the factor rotation achieves is that it
takes the indicators first so let's assume we
00:00:45.270 --> 00:00:56.790
have here six indicators and items 1 2 3 vary
together. Items 4 5 & 6 vary together. So this
00:00:56.790 --> 00:01:03.750
is the score of person number 1 and this is the
score of person number 2 here. And when we do a
00:01:03.750 --> 00:01:11.280
factor analysis - first we extract first factor
then the factor analysis starts from the origin
00:01:11.280 --> 00:01:19.350
here and it asks the question which direction
the data are. And it will indicate that all
00:01:19.350 --> 00:01:25.680
the data are to that direction here. So all
the data are to the right and down a bit.
00:01:25.680 --> 00:01:33.630
Then the next thing that we do is that we
eliminate the influence of this factor. So
00:01:33.630 --> 00:01:39.990
we basically sift all these observations
sideways so that they have a zero value
00:01:39.990 --> 00:01:47.520
on this variation - this factor - and then we
extract another factor asking which direction
00:01:47.520 --> 00:01:55.590
the observations are or the variables are - then
the answer is that they are either up or down.
00:01:55.590 --> 00:02:03.900
So they are orthogonal to this factor here
and with two persons we can use these two
00:02:03.900 --> 00:02:10.980
factors to pinpoint the location of each
indicator. So this indicator item one
00:02:10.980 --> 00:02:17.070
is this much along the first factor and
then that much along the second factor.
00:02:17.070 --> 00:02:22.290
This also indicates that the first factor shows
the overall direction and the second factor is
00:02:22.290 --> 00:02:29.760
usually a positive or a negative depending on
which direction we go to that factor - do we
00:02:29.760 --> 00:02:35.550
go up or down. And the problem of course is
that if we have to summarize - if we want to
00:02:35.550 --> 00:02:41.130
summarize this data - then we would say that
this group of indicators in this direction
00:02:41.130 --> 00:02:47.280
and the other group is in that direction - so
the factor analysis really doesn't reflect that
00:02:47.280 --> 00:02:53.160
dimensionality even if it allows us to summarize
these indicators give them more coordinates.
00:02:53.160 --> 00:02:58.980
So the problem is that the first
factor explains a little bit of
00:02:58.980 --> 00:03:03.630
every indicator and then the second factor
has positive or negative loadings and they
00:03:03.630 --> 00:03:07.980
don't really explain where the data are
in a way that is easier to interpret.
00:03:07.980 --> 00:03:13.890
The purpose of a factor loading is that we
try to reorient the factor analysis solution
00:03:13.890 --> 00:03:22.440
so that indicators load highly on one factor
and one factor only. So we try to maximize
00:03:22.440 --> 00:03:28.290
each indicators largest factor loading
and minimize all other factor loadings.
00:03:28.290 --> 00:03:34.110
It also makes the variances more
equal. So here the first factor
00:03:34.110 --> 00:03:39.990
explains on or here the second factor
actually explains more variation than
00:03:39.990 --> 00:03:42.570
the first factor because all the
indicators are in this direction.
00:03:42.570 --> 00:03:49.350
So there are different techniques and the
techniques are in two variants. We have
00:03:49.350 --> 00:03:54.420
oblique and orthogonal rotation. Oblique rotation
maintains the factors that they're uncorrelated.
00:03:54.420 --> 00:04:02.220
So we kind of take the factor solution here
and then we rotate it around the zero axis
00:04:02.220 --> 00:04:07.500
like that. So we rotate those two arrows so
that they point more toward the clusters of
00:04:07.500 --> 00:04:14.250
the observations. Like so. So we rotate it a
bit about 45 degrees or a bit less and then
00:04:14.250 --> 00:04:20.730
now the first factor points the direction of the
first items 1 2 3 and the second factor points to
00:04:20.730 --> 00:04:29.850
items 4 5 and 6. But these factors still don't
point exactly to where the items are because
00:04:29.850 --> 00:04:34.380
we are constraining that the factors must be
uncorrelated. So this is a 90 degree angle.
00:04:34.380 --> 00:04:40.830
When we relax that assumption we can
actually draw the lines. So that the
00:04:40.830 --> 00:04:46.230
factors are correlated when this factor is
higher then this factor can be higher as
00:04:46.230 --> 00:04:53.400
well and now the arrows point that the first
three items are in this direction the second
00:04:53.400 --> 00:04:57.540
are three items are in that direction
and that's the idea of factor rotation.
00:04:57.540 --> 00:05:04.860
So you are reorient the factor analysis
to make it a more simpler to interpret.
00:05:04.860 --> 00:05:11.640
So do you have to understand what exactly
the factor of this and that does? The answer
00:05:11.640 --> 00:05:16.080
is no because there is a simple
rule of thumb that you can apply.
00:05:16.080 --> 00:05:23.610
The rule of thumb is that always
use Oblimion rotation because it's
00:05:23.610 --> 00:05:28.290
theoretically the most appealing for
many scenarios and particularly it is
00:05:28.290 --> 00:05:33.600
an oblique rotation. If your factors
are supposed to represent constructs
00:05:33.600 --> 00:05:37.950
that are correlated - which is the
case if we make a theory about those
00:05:37.950 --> 00:05:44.280
constructs - then constraining the factors to
be uncorrelated - it doesn't make any sense.
00:05:44.280 --> 00:05:51.900
Very much rotation is often the default and
it's an oblique rotation so you should never
00:05:51.900 --> 00:05:58.260
use that one. The reason why varymax
is the default is because of history.
00:05:58.260 --> 00:06:05.100
Factor analysis has decades of history and
when the factor analysis was introduced we
00:06:05.100 --> 00:06:10.290
really didn't have computers so people
were doing hand calculations and the
00:06:10.290 --> 00:06:16.440
varymax rotation is much simpler to calculate
than the Oblimin rotation. But nowadays the
00:06:16.440 --> 00:06:24.600
computer will do these both of these for you
instantaneously so the amount of computation
00:06:24.600 --> 00:06:28.890
it's a non issue. You should really go with
the direct oblimin instead of anything else.
00:06:28.890 --> 00:06:39.120
And when you look at articles they actually
report that oblimin is used. This is a pretty
00:06:39.120 --> 00:06:45.840
nice way of reporting a factor analysis from
this information systems research paper. So
00:06:45.840 --> 00:06:51.510
the authors report that they conducted
exploratory factor analysis. They did
00:06:51.510 --> 00:06:56.970
oblimin rotation. They also explained why they
did oblimin rotation because they want to have
00:06:56.970 --> 00:07:01.770
the factors to be correlated and you only
need one sentence and two lines for that.
00:07:01.770 --> 00:07:06.540
So that's really nice way of reporting that
you actually did factor analysis correctly.