WEBVTT

WEBVTT
Kind: captions
Language: en

00:00:00.060 --> 00:00:03.810
Factor analysis answers the question 
of what indicators have in common.

00:00:03.810 --> 00:00:10.410
In an exploratory factor analysis you start by 
first extracting all common variants from the  

00:00:10.410 --> 00:00:17.340
data and then you take that out from the data 
- then you extract another factor and so on.

00:00:17.340 --> 00:00:23.400
This process leads the factor analysis 
solution that is often uninterpretable  

00:00:23.400 --> 00:00:28.650
because most of the indicators will load 
highly on the first factor and then have  

00:00:28.650 --> 00:00:32.070
a mixture of positive and negative 
loadings on the remaining factors.  

00:00:32.070 --> 00:00:38.010
To make the factor analysis results more 
interpretable we do a factor rotation.

00:00:38.010 --> 00:00:45.270
What the factor rotation achieves is that it 
takes the indicators first so let's assume we  

00:00:45.270 --> 00:00:56.790
have here six indicators and items 1 2 3 vary 
together. Items 4 5 & 6 vary together. So this  

00:00:56.790 --> 00:01:03.750
is the score of person number 1 and this is the 
score of person number 2 here. And when we do a  

00:01:03.750 --> 00:01:11.280
factor analysis - first we extract first factor 
then the factor analysis starts from the origin  

00:01:11.280 --> 00:01:19.350
here and it asks the question which direction 
the data are. And it will indicate that all  

00:01:19.350 --> 00:01:25.680
the data are to that direction here. So all 
the data are to the right and down a bit.

00:01:25.680 --> 00:01:33.630
Then the next thing that we do is that we 
eliminate the influence of this factor. So  

00:01:33.630 --> 00:01:39.990
we basically sift all these observations 
sideways so that they have a zero value  

00:01:39.990 --> 00:01:47.520
on this variation - this factor - and then we 
extract another factor asking which direction  

00:01:47.520 --> 00:01:55.590
the observations are or the variables are - then 
the answer is that they are either up or down.

00:01:55.590 --> 00:02:03.900
So they are orthogonal to this factor here 
and with two persons we can use these two  

00:02:03.900 --> 00:02:10.980
factors to pinpoint the location of each 
indicator. So this indicator item one  

00:02:10.980 --> 00:02:17.070
is this much along the first factor and 
then that much along the second factor.

00:02:17.070 --> 00:02:22.290
This also indicates that the first factor shows 
the overall direction and the second factor is  

00:02:22.290 --> 00:02:29.760
usually a positive or a negative depending on 
which direction we go to that factor - do we  

00:02:29.760 --> 00:02:35.550
go up or down. And the problem of course is 
that if we have to summarize - if we want to  

00:02:35.550 --> 00:02:41.130
summarize this data - then we would say that 
this group of indicators in this direction  

00:02:41.130 --> 00:02:47.280
and the other group is in that direction - so 
the factor analysis really doesn't reflect that  

00:02:47.280 --> 00:02:53.160
dimensionality even if it allows us to summarize 
these indicators give them more coordinates.

00:02:53.160 --> 00:02:58.980
So the problem is that the first 
factor explains a little bit of  

00:02:58.980 --> 00:03:03.630
every indicator and then the second factor 
has positive or negative loadings and they  

00:03:03.630 --> 00:03:07.980
don't really explain where the data are 
in a way that is easier to interpret.

00:03:07.980 --> 00:03:13.890
The purpose of a factor loading is that we 
try to reorient the factor analysis solution  

00:03:13.890 --> 00:03:22.440
so that indicators load highly on one factor 
and one factor only. So we try to maximize  

00:03:22.440 --> 00:03:28.290
each indicators largest factor loading 
and minimize all other factor loadings.

00:03:28.290 --> 00:03:34.110
It also makes the variances more 
equal. So here the first factor  

00:03:34.110 --> 00:03:39.990
explains on or here the second factor 
actually explains more variation than  

00:03:39.990 --> 00:03:42.570
the first factor because all the 
indicators are in this direction.

00:03:42.570 --> 00:03:49.350
So there are different techniques and the 
techniques are in two variants. We have  

00:03:49.350 --> 00:03:54.420
oblique and orthogonal rotation. Oblique rotation 
maintains the factors that they're uncorrelated.  

00:03:54.420 --> 00:04:02.220
So we kind of take the factor solution here 
and then we rotate it around the zero axis  

00:04:02.220 --> 00:04:07.500
like that. So we rotate those two arrows so 
that they point more toward the clusters of  

00:04:07.500 --> 00:04:14.250
the observations. Like so. So we rotate it a 
bit about 45 degrees or a bit less and then  

00:04:14.250 --> 00:04:20.730
now the first factor points the direction of the 
first items 1 2 3 and the second factor points to  

00:04:20.730 --> 00:04:29.850
items 4 5 and 6. But these factors still don't 
point exactly to where the items are because  

00:04:29.850 --> 00:04:34.380
we are constraining that the factors must be 
uncorrelated. So this is a 90 degree angle.

00:04:34.380 --> 00:04:40.830
When we relax that assumption we can 
actually draw the lines. So that the  

00:04:40.830 --> 00:04:46.230
factors are correlated when this factor is 
higher then this factor can be higher as  

00:04:46.230 --> 00:04:53.400
well and now the arrows point that the first 
three items are in this direction the second  

00:04:53.400 --> 00:04:57.540
are three items are in that direction 
and that's the idea of factor rotation.  

00:04:57.540 --> 00:05:04.860
So you are reorient the factor analysis 
to make it a more simpler to interpret.

00:05:04.860 --> 00:05:11.640
So do you have to understand what exactly 
the factor of this and that does? The answer  

00:05:11.640 --> 00:05:16.080
is no because there is a simple 
rule of thumb that you can apply.

00:05:16.080 --> 00:05:23.610
The rule of thumb is that always 
use Oblimion rotation because it's  

00:05:23.610 --> 00:05:28.290
theoretically the most appealing for 
many scenarios and particularly it is  

00:05:28.290 --> 00:05:33.600
an oblique rotation. If your factors 
are supposed to represent constructs  

00:05:33.600 --> 00:05:37.950
that are correlated - which is the 
case if we make a theory about those  

00:05:37.950 --> 00:05:44.280
constructs - then constraining the factors to 
be uncorrelated - it doesn't make any sense.

00:05:44.280 --> 00:05:51.900
Very much rotation is often the default and 
it's an oblique rotation so you should never  

00:05:51.900 --> 00:05:58.260
use that one. The reason why varymax 
is the default is because of history.  

00:05:58.260 --> 00:06:05.100
Factor analysis has decades of history and 
when the factor analysis was introduced we  

00:06:05.100 --> 00:06:10.290
really didn't have computers so people 
were doing hand calculations and the  

00:06:10.290 --> 00:06:16.440
varymax rotation is much simpler to calculate 
than the Oblimin rotation. But nowadays the  

00:06:16.440 --> 00:06:24.600
computer will do these both of these for you 
instantaneously so the amount of computation  

00:06:24.600 --> 00:06:28.890
it's a non issue. You should really go with 
the direct oblimin instead of anything else.

00:06:28.890 --> 00:06:39.120
And when you look at articles they actually 
report that oblimin is used. This is a pretty  

00:06:39.120 --> 00:06:45.840
nice way of reporting a factor analysis from 
this information systems research paper. So  

00:06:45.840 --> 00:06:51.510
the authors report that they conducted 
exploratory factor analysis. They did  

00:06:51.510 --> 00:06:56.970
oblimin rotation. They also explained why they 
did oblimin rotation because they want to have  

00:06:56.970 --> 00:07:01.770
the factors to be correlated and you only 
need one sentence and two lines for that.  

00:07:01.770 --> 00:07:06.540
So that's really nice way of reporting that 
you actually did factor analysis correctly.