WEBVTT WEBVTT Kind: captions Language: en 00:00:00.060 --> 00:00:03.810 Factor analysis answers the question of what indicators have in common. 00:00:03.810 --> 00:00:10.410 In an exploratory factor analysis you start by first extracting all common variants from the 00:00:10.410 --> 00:00:17.340 data and then you take that out from the data - then you extract another factor and so on. 00:00:17.340 --> 00:00:23.400 This process leads the factor analysis solution that is often uninterpretable 00:00:23.400 --> 00:00:28.650 because most of the indicators will load highly on the first factor and then have 00:00:28.650 --> 00:00:32.070 a mixture of positive and negative loadings on the remaining factors. 00:00:32.070 --> 00:00:38.010 To make the factor analysis results more interpretable we do a factor rotation. 00:00:38.010 --> 00:00:45.270 What the factor rotation achieves is that it takes the indicators first so let's assume we 00:00:45.270 --> 00:00:56.790 have here six indicators and items 1 2 3 vary together. Items 4 5 & 6 vary together. So this 00:00:56.790 --> 00:01:03.750 is the score of person number 1 and this is the score of person number 2 here. And when we do a 00:01:03.750 --> 00:01:11.280 factor analysis - first we extract first factor then the factor analysis starts from the origin 00:01:11.280 --> 00:01:19.350 here and it asks the question which direction the data are. And it will indicate that all 00:01:19.350 --> 00:01:25.680 the data are to that direction here. So all the data are to the right and down a bit. 00:01:25.680 --> 00:01:33.630 Then the next thing that we do is that we eliminate the influence of this factor. So 00:01:33.630 --> 00:01:39.990 we basically sift all these observations sideways so that they have a zero value 00:01:39.990 --> 00:01:47.520 on this variation - this factor - and then we extract another factor asking which direction 00:01:47.520 --> 00:01:55.590 the observations are or the variables are - then the answer is that they are either up or down. 00:01:55.590 --> 00:02:03.900 So they are orthogonal to this factor here and with two persons we can use these two 00:02:03.900 --> 00:02:10.980 factors to pinpoint the location of each indicator. So this indicator item one 00:02:10.980 --> 00:02:17.070 is this much along the first factor and then that much along the second factor. 00:02:17.070 --> 00:02:22.290 This also indicates that the first factor shows the overall direction and the second factor is 00:02:22.290 --> 00:02:29.760 usually a positive or a negative depending on which direction we go to that factor - do we 00:02:29.760 --> 00:02:35.550 go up or down. And the problem of course is that if we have to summarize - if we want to 00:02:35.550 --> 00:02:41.130 summarize this data - then we would say that this group of indicators in this direction 00:02:41.130 --> 00:02:47.280 and the other group is in that direction - so the factor analysis really doesn't reflect that 00:02:47.280 --> 00:02:53.160 dimensionality even if it allows us to summarize these indicators give them more coordinates. 00:02:53.160 --> 00:02:58.980 So the problem is that the first factor explains a little bit of 00:02:58.980 --> 00:03:03.630 every indicator and then the second factor has positive or negative loadings and they 00:03:03.630 --> 00:03:07.980 don't really explain where the data are in a way that is easier to interpret. 00:03:07.980 --> 00:03:13.890 The purpose of a factor loading is that we try to reorient the factor analysis solution 00:03:13.890 --> 00:03:22.440 so that indicators load highly on one factor and one factor only. So we try to maximize 00:03:22.440 --> 00:03:28.290 each indicators largest factor loading and minimize all other factor loadings. 00:03:28.290 --> 00:03:34.110 It also makes the variances more equal. So here the first factor 00:03:34.110 --> 00:03:39.990 explains on or here the second factor actually explains more variation than 00:03:39.990 --> 00:03:42.570 the first factor because all the indicators are in this direction. 00:03:42.570 --> 00:03:49.350 So there are different techniques and the techniques are in two variants. We have 00:03:49.350 --> 00:03:54.420 oblique and orthogonal rotation. Oblique rotation maintains the factors that they're uncorrelated. 00:03:54.420 --> 00:04:02.220 So we kind of take the factor solution here and then we rotate it around the zero axis 00:04:02.220 --> 00:04:07.500 like that. So we rotate those two arrows so that they point more toward the clusters of 00:04:07.500 --> 00:04:14.250 the observations. Like so. So we rotate it a bit about 45 degrees or a bit less and then 00:04:14.250 --> 00:04:20.730 now the first factor points the direction of the first items 1 2 3 and the second factor points to 00:04:20.730 --> 00:04:29.850 items 4 5 and 6. But these factors still don't point exactly to where the items are because 00:04:29.850 --> 00:04:34.380 we are constraining that the factors must be uncorrelated. So this is a 90 degree angle. 00:04:34.380 --> 00:04:40.830 When we relax that assumption we can actually draw the lines. So that the 00:04:40.830 --> 00:04:46.230 factors are correlated when this factor is higher then this factor can be higher as 00:04:46.230 --> 00:04:53.400 well and now the arrows point that the first three items are in this direction the second 00:04:53.400 --> 00:04:57.540 are three items are in that direction and that's the idea of factor rotation. 00:04:57.540 --> 00:05:04.860 So you are reorient the factor analysis to make it a more simpler to interpret. 00:05:04.860 --> 00:05:11.640 So do you have to understand what exactly the factor of this and that does? The answer 00:05:11.640 --> 00:05:16.080 is no because there is a simple rule of thumb that you can apply. 00:05:16.080 --> 00:05:23.610 The rule of thumb is that always use Oblimion rotation because it's 00:05:23.610 --> 00:05:28.290 theoretically the most appealing for many scenarios and particularly it is 00:05:28.290 --> 00:05:33.600 an oblique rotation. If your factors are supposed to represent constructs 00:05:33.600 --> 00:05:37.950 that are correlated - which is the case if we make a theory about those 00:05:37.950 --> 00:05:44.280 constructs - then constraining the factors to be uncorrelated - it doesn't make any sense. 00:05:44.280 --> 00:05:51.900 Very much rotation is often the default and it's an oblique rotation so you should never 00:05:51.900 --> 00:05:58.260 use that one. The reason why varymax is the default is because of history. 00:05:58.260 --> 00:06:05.100 Factor analysis has decades of history and when the factor analysis was introduced we 00:06:05.100 --> 00:06:10.290 really didn't have computers so people were doing hand calculations and the 00:06:10.290 --> 00:06:16.440 varymax rotation is much simpler to calculate than the Oblimin rotation. But nowadays the 00:06:16.440 --> 00:06:24.600 computer will do these both of these for you instantaneously so the amount of computation 00:06:24.600 --> 00:06:28.890 it's a non issue. You should really go with the direct oblimin instead of anything else. 00:06:28.890 --> 00:06:39.120 And when you look at articles they actually report that oblimin is used. This is a pretty 00:06:39.120 --> 00:06:45.840 nice way of reporting a factor analysis from this information systems research paper. So 00:06:45.840 --> 00:06:51.510 the authors report that they conducted exploratory factor analysis. They did 00:06:51.510 --> 00:06:56.970 oblimin rotation. They also explained why they did oblimin rotation because they want to have 00:06:56.970 --> 00:07:01.770 the factors to be correlated and you only need one sentence and two lines for that. 00:07:01.770 --> 00:07:06.540 So that's really nice way of reporting that you actually did factor analysis correctly.