WEBVTT WEBVTT Kind: captions Language: en 00:00:00.060 --> 00:00:03.810 Factor analysis answers the question  of what indicators have in common. 00:00:03.810 --> 00:00:10.410 In an exploratory factor analysis you start by  first extracting all common variants from the   00:00:10.410 --> 00:00:17.340 data and then you take that out from the data  - then you extract another factor and so on. 00:00:17.340 --> 00:00:23.400 This process leads the factor analysis  solution that is often uninterpretable   00:00:23.400 --> 00:00:28.650 because most of the indicators will load  highly on the first factor and then have   00:00:28.650 --> 00:00:32.070 a mixture of positive and negative  loadings on the remaining factors.   00:00:32.070 --> 00:00:38.010 To make the factor analysis results more  interpretable we do a factor rotation. 00:00:38.010 --> 00:00:45.270 What the factor rotation achieves is that it  takes the indicators first so let's assume we   00:00:45.270 --> 00:00:56.790 have here six indicators and items 1 2 3 vary  together. Items 4 5 & 6 vary together. So this   00:00:56.790 --> 00:01:03.750 is the score of person number 1 and this is the  score of person number 2 here. And when we do a   00:01:03.750 --> 00:01:11.280 factor analysis - first we extract first factor  then the factor analysis starts from the origin   00:01:11.280 --> 00:01:19.350 here and it asks the question which direction  the data are. And it will indicate that all   00:01:19.350 --> 00:01:25.680 the data are to that direction here. So all  the data are to the right and down a bit. 00:01:25.680 --> 00:01:33.630 Then the next thing that we do is that we  eliminate the influence of this factor. So   00:01:33.630 --> 00:01:39.990 we basically sift all these observations  sideways so that they have a zero value   00:01:39.990 --> 00:01:47.520 on this variation - this factor - and then we  extract another factor asking which direction   00:01:47.520 --> 00:01:55.590 the observations are or the variables are - then  the answer is that they are either up or down. 00:01:55.590 --> 00:02:03.900 So they are orthogonal to this factor here  and with two persons we can use these two   00:02:03.900 --> 00:02:10.980 factors to pinpoint the location of each  indicator. So this indicator item one   00:02:10.980 --> 00:02:17.070 is this much along the first factor and  then that much along the second factor. 00:02:17.070 --> 00:02:22.290 This also indicates that the first factor shows  the overall direction and the second factor is   00:02:22.290 --> 00:02:29.760 usually a positive or a negative depending on  which direction we go to that factor - do we   00:02:29.760 --> 00:02:35.550 go up or down. And the problem of course is  that if we have to summarize - if we want to   00:02:35.550 --> 00:02:41.130 summarize this data - then we would say that  this group of indicators in this direction   00:02:41.130 --> 00:02:47.280 and the other group is in that direction - so  the factor analysis really doesn't reflect that   00:02:47.280 --> 00:02:53.160 dimensionality even if it allows us to summarize  these indicators give them more coordinates. 00:02:53.160 --> 00:02:58.980 So the problem is that the first  factor explains a little bit of   00:02:58.980 --> 00:03:03.630 every indicator and then the second factor  has positive or negative loadings and they   00:03:03.630 --> 00:03:07.980 don't really explain where the data are  in a way that is easier to interpret. 00:03:07.980 --> 00:03:13.890 The purpose of a factor loading is that we  try to reorient the factor analysis solution   00:03:13.890 --> 00:03:22.440 so that indicators load highly on one factor  and one factor only. So we try to maximize   00:03:22.440 --> 00:03:28.290 each indicators largest factor loading  and minimize all other factor loadings. 00:03:28.290 --> 00:03:34.110 It also makes the variances more  equal. So here the first factor   00:03:34.110 --> 00:03:39.990 explains on or here the second factor  actually explains more variation than   00:03:39.990 --> 00:03:42.570 the first factor because all the  indicators are in this direction. 00:03:42.570 --> 00:03:49.350 So there are different techniques and the  techniques are in two variants. We have   00:03:49.350 --> 00:03:54.420 oblique and orthogonal rotation. Oblique rotation  maintains the factors that they're uncorrelated.   00:03:54.420 --> 00:04:02.220 So we kind of take the factor solution here  and then we rotate it around the zero axis   00:04:02.220 --> 00:04:07.500 like that. So we rotate those two arrows so  that they point more toward the clusters of   00:04:07.500 --> 00:04:14.250 the observations. Like so. So we rotate it a  bit about 45 degrees or a bit less and then   00:04:14.250 --> 00:04:20.730 now the first factor points the direction of the  first items 1 2 3 and the second factor points to   00:04:20.730 --> 00:04:29.850 items 4 5 and 6. But these factors still don't  point exactly to where the items are because   00:04:29.850 --> 00:04:34.380 we are constraining that the factors must be  uncorrelated. So this is a 90 degree angle. 00:04:34.380 --> 00:04:40.830 When we relax that assumption we can  actually draw the lines. So that the   00:04:40.830 --> 00:04:46.230 factors are correlated when this factor is  higher then this factor can be higher as   00:04:46.230 --> 00:04:53.400 well and now the arrows point that the first  three items are in this direction the second   00:04:53.400 --> 00:04:57.540 are three items are in that direction  and that's the idea of factor rotation.   00:04:57.540 --> 00:05:04.860 So you are reorient the factor analysis  to make it a more simpler to interpret. 00:05:04.860 --> 00:05:11.640 So do you have to understand what exactly  the factor of this and that does? The answer   00:05:11.640 --> 00:05:16.080 is no because there is a simple  rule of thumb that you can apply. 00:05:16.080 --> 00:05:23.610 The rule of thumb is that always  use Oblimion rotation because it's   00:05:23.610 --> 00:05:28.290 theoretically the most appealing for  many scenarios and particularly it is   00:05:28.290 --> 00:05:33.600 an oblique rotation. If your factors  are supposed to represent constructs   00:05:33.600 --> 00:05:37.950 that are correlated - which is the  case if we make a theory about those   00:05:37.950 --> 00:05:44.280 constructs - then constraining the factors to  be uncorrelated - it doesn't make any sense. 00:05:44.280 --> 00:05:51.900 Very much rotation is often the default and  it's an oblique rotation so you should never   00:05:51.900 --> 00:05:58.260 use that one. The reason why varymax  is the default is because of history.   00:05:58.260 --> 00:06:05.100 Factor analysis has decades of history and  when the factor analysis was introduced we   00:06:05.100 --> 00:06:10.290 really didn't have computers so people  were doing hand calculations and the   00:06:10.290 --> 00:06:16.440 varymax rotation is much simpler to calculate  than the Oblimin rotation. But nowadays the   00:06:16.440 --> 00:06:24.600 computer will do these both of these for you  instantaneously so the amount of computation   00:06:24.600 --> 00:06:28.890 it's a non issue. You should really go with  the direct oblimin instead of anything else. 00:06:28.890 --> 00:06:39.120 And when you look at articles they actually  report that oblimin is used. This is a pretty   00:06:39.120 --> 00:06:45.840 nice way of reporting a factor analysis from  this information systems research paper. So   00:06:45.840 --> 00:06:51.510 the authors report that they conducted  exploratory factor analysis. They did   00:06:51.510 --> 00:06:56.970 oblimin rotation. They also explained why they  did oblimin rotation because they want to have   00:06:56.970 --> 00:07:01.770 the factors to be correlated and you only  need one sentence and two lines for that.   00:07:01.770 --> 00:07:06.540 So that's really nice way of reporting that  you actually did factor analysis correctly.