WEBVTT WEBVTT Kind: captions Language: en 00:00:00.060 --> 00:00:05.520 Let's take a look at an empirical example of  exploratory factor analysis. To do that we need   00:00:05.520 --> 00:00:10.980 some data and our data comes from a research  paper by Mesquita and Lazzarini from 2008. 00:00:10.980 --> 00:00:16.290 This is an interesting paper because the  authors present the full correlation matrix   00:00:16.290 --> 00:00:21.990 of all the indicators in the paper. That  means that we can replicate everything   00:00:21.990 --> 00:00:26.730 that authors do using the correlation matrix  and we also get the same result for all the   00:00:26.730 --> 00:00:31.440 analysis. So this is completely transparent  paper that we can replicate ourselves. 00:00:31.440 --> 00:00:37.170 This article uses converter factor analysis and  structural regression models but we can equally   00:00:37.170 --> 00:00:42.570 well do an exploratory factor analysis to see  if we get the same result as the authors did. 00:00:42.570 --> 00:00:50.910 So this is the data set that we have. And it's  the table one descriptive statistics correlations   00:00:50.910 --> 00:00:57.420 except instead of on a scale level it is on the  indicator level. We will be using all questions   00:00:57.420 --> 00:01:03.630 that are measured on the one to seven scale  to eliminate any scaling issues from the data. 00:01:03.630 --> 00:01:11.400 So we'll have five scales. These five here  and the indicators are three indicators for   00:01:11.400 --> 00:01:16.050 horizontal governance three indicators  of vertical governance three indicators   00:01:16.050 --> 00:01:22.050 of collective sourcing two indicators for export  orientation and three indicators for investment. 00:01:22.050 --> 00:01:26.700 Whether these indicators measure what  the authors claim they do measure is a   00:01:26.700 --> 00:01:31.050 question that we will not address  in this video. We'll just take a   00:01:31.050 --> 00:01:36.420 look at whether for example these export  orientation indicators can be argued to   00:01:36.420 --> 00:01:40.680 measure something together that is  distinct from the other indicators. 00:01:40.680 --> 00:01:48.630 So we have 14 variables and we want to assess  whether they measure five distinct things. 00:01:48.630 --> 00:01:55.440 In an exploratory factor analysis - when we start  the analysis we have to define how many factors   00:01:55.440 --> 00:02:04.470 we extract. So one way to do that decision is  to use a tool called scree plot. So the idea of   00:02:04.470 --> 00:02:11.880 a scree plot is that we extract components  from the data and then we have a variable   00:02:11.880 --> 00:02:16.290 here that quantifies how many variables  for the variants each component explains. 00:02:16.290 --> 00:02:23.160 Some rules of thumb on how to choose the  number of factors is that we can either   00:02:23.160 --> 00:02:29.670 choose 5 factors based on a pivot point.  So a clear pivot point when the curve   00:02:29.670 --> 00:02:34.230 starts to go flat means that that's the  number of factors that we should extract. 00:02:34.230 --> 00:02:41.370 Another rule of thumb is that we go as long  as we get these eigen values more than one   00:02:41.370 --> 00:02:49.080 which would be 4 factors. But here we know  that this set of indicators is supposed to   00:02:49.080 --> 00:02:53.940 measure 5 distinct things so we can use the  best rule of thumb which is our theory and   00:02:53.940 --> 00:03:00.270 theory states that we take 5 factors. Because we  have 5 different things that we want to measure. 00:03:00.270 --> 00:03:06.810 So we apply factor analysis. We request  5 factors using these 14 indicators.   00:03:06.810 --> 00:03:11.310 We get the result printout. So  what does the printout tell us? 00:03:11.310 --> 00:03:15.900 There are different sections. There are 3  different sections. The first section is the   00:03:15.900 --> 00:03:22.440 factor loadings. So these statistics tell  how strongly the indicators are related to   00:03:22.440 --> 00:03:27.990 each factor and how much uniqueness there is in  the indicators that the factors don't explain. 00:03:27.990 --> 00:03:35.250 The second section is the variance explained  how much each factor explains the variation   00:03:35.250 --> 00:03:40.770 and then finally in the table or in the bottom  section we have different model quality indices.   00:03:40.770 --> 00:03:47.190 I don't typically myself interpret this  model quality indices because if I want   00:03:47.190 --> 00:03:52.140 to really know if the model fits the data  well or not I will do it with the converter   00:03:52.140 --> 00:03:58.440 factor analysis based techniques which have  a lot more diagnostics options available. 00:03:58.440 --> 00:04:04.110 So in practice we interpret the factor loading  pattern - how strong the individual loadings are   00:04:04.110 --> 00:04:09.930 and how much variations the factors explained  if you want to do more diagnostics then it's   00:04:09.930 --> 00:04:13.710 better to move into the confirmatory  factor analysis family of techniques. 00:04:13.710 --> 00:04:19.350 So the factor loadings here provide  us some information. They provide us   00:04:19.350 --> 00:04:23.940 information about how strongly each  indicator is related to its factor. 00:04:23.940 --> 00:04:29.460 The factor loadings are regressions of  items of factors. So it's regression   00:04:29.460 --> 00:04:33.390 path it's the directional path  because this is a standardized   00:04:33.390 --> 00:04:37.530 factor analysis solutions and the  factors are uncorrelated in this   00:04:37.530 --> 00:04:44.400 factor solution which they are by default -  then the loadings also equal correlations. 00:04:44.400 --> 00:04:54.390 So this last item correlated 0.75 with the second  factor. Then we have also the uniqueness here or   00:04:54.390 --> 00:05:02.220 the commonality dates square which tells how much  of the variation of the indicator all the factors   00:05:02.220 --> 00:05:09.210 explained together and uniqueness how much of the  variance of the indicator remains unexplained. 00:05:09.210 --> 00:05:16.800 Sometimes the uniqueness is interpreted as  evidence or a measure of unreliability. So   00:05:16.800 --> 00:05:23.550 uniqueness is 30% we say that the indicators  error variance is 30%. 70% is the reliable   00:05:23.550 --> 00:05:29.250 variance. The problem with that approach  is that the uniqueness also captures other   00:05:29.250 --> 00:05:33.690 sources of unique variation that is not  random noise. So for example there's   00:05:33.690 --> 00:05:41.130 probably something unique in total quality  management item that is not related to other   00:05:41.130 --> 00:05:46.380 investment items that would be reliable  if we ever asked the same question again. 00:05:46.380 --> 00:05:53.910 So factor analysis puts the  unreliability variance the   00:05:53.910 --> 00:05:58.020 random error and the unique various into  one same number and there is really no   00:05:58.020 --> 00:06:02.370 way of taking them apart. So that's  one weakness of a factor analysis. 00:06:03.810 --> 00:06:09.750 The variance explained here shows that the first  factor explains most of the variation but this   00:06:09.750 --> 00:06:15.720 is an unrotated solution so we don't really pay  much attention to this except for one thing. So   00:06:15.720 --> 00:06:24.450 we can do a Harman's single factor test which  you sometimes see reported in papers and the   00:06:24.450 --> 00:06:32.850 Harman's test involves checking whether  the first factor explains a majority of   00:06:32.850 --> 00:06:36.630 the data - of the variance in the data - and  whether it dominates over the other factors. 00:06:36.630 --> 00:06:41.460 So we can see here the first factor is 25  percent the second factor is 16 percent.   00:06:41.460 --> 00:06:45.840 We can't say that the first factor would  explain most of the data. We can't say that   00:06:45.840 --> 00:06:51.000 it will dominate over the other factors because  25 and 16 percent are still in the same bullpark. 00:06:51.000 --> 00:06:57.630 The Harman's single factor test is a  bit misleading in this name because   00:06:57.630 --> 00:07:01.770 it's not really a statistical test and  it's not even a very good diagnostic   00:07:01.770 --> 00:07:09.840 because it will probably detect only  very severe method variance problems. 00:07:09.840 --> 00:07:16.050 Nevertheless it's something that you can easily  check from the results of exploratory factor   00:07:16.050 --> 00:07:22.410 analysis if you want to do more rigorous tests of  method variance then you can apply confectionery   00:07:22.410 --> 00:07:27.750 factor analysis based techniques that allow you  much more degrees of freedom on what you can do. 00:07:27.750 --> 00:07:33.330 Let's take a look at the factor loadings. The  idea of factor loadings is that they should   00:07:33.330 --> 00:07:39.660 show a pattern. So we should see that the  indicators that are supposed to measure the   00:07:39.660 --> 00:07:44.430 first three indicators - they're supposed to  measure one thing - should load on one factor   00:07:44.430 --> 00:07:49.740 and one factor only and then the measures  of the other construct should not load on   00:07:49.740 --> 00:07:54.630 that factor. So it's not the case here and the  reason why it's not the case is that this is   00:07:54.630 --> 00:08:00.210 an unrotated factor solution. So typically in  a factor analysis when we extract the factors   00:08:00.210 --> 00:08:05.670 we take the first factor that explains the  majority of the data and if the constructs   00:08:05.670 --> 00:08:11.640 that cause the data are correlated then the  first factor contains a little bit of every   00:08:11.640 --> 00:08:17.460 construct. So it's all indicators load on  it highly and we can't really interpret it. 00:08:17.460 --> 00:08:24.030 So we do a factor rotation and factor rotation  simplifies the factor announce results. It also   00:08:24.030 --> 00:08:32.760 makes another nice - has another nice feature.  Factor rotation can relax the constraint that   00:08:32.760 --> 00:08:41.640 all the factors are uncorrelated when we do the  factor analysis. The zero correlated constraint   00:08:41.640 --> 00:08:47.730 is something it's a technical reason why we  have it and it doesn't make any theoretical   00:08:47.730 --> 00:08:54.360 sense if we are studying constructs that we think  are related. So if we think that the constructs   00:08:54.360 --> 00:08:59.550 are related causslly or otherwise we cannot  assume that the constructs are uncorrelated. 00:08:59.550 --> 00:09:03.900 Therefore imposing constraint the two factors that   00:09:03.900 --> 00:09:08.100 are supposed to represent those constructs  are uncorrelated doesn't make any sense. 00:09:08.100 --> 00:09:14.250 That's another reason why we rotate the  factors which are relaxed in that constraint. 00:09:14.250 --> 00:09:20.010 The factor rotation simplifies the  result and after rotation we can   00:09:20.010 --> 00:09:24.900 see that the first three indicators go  to one factor the second three another   00:09:24.900 --> 00:09:29.730 factor. So we have a nice pattern  that each indicator - its group   00:09:29.730 --> 00:09:33.480 of indicators loads from one factor  only and there are no cross loading. 00:09:33.480 --> 00:09:40.260 So this would be evidence that these indicators  - for example these three indicators - measure   00:09:40.260 --> 00:09:46.980 the same thing together and it is distinct  from what these other indicators may measure. 00:09:46.980 --> 00:09:50.610 So you want to have this kind of  pattern and it is indication of   00:09:50.610 --> 00:09:56.340 validity. Of course it doesn't guarantee  validity because it doesn't tell us what   00:09:56.340 --> 00:10:01.020 these indicators have in common but it's  some kind of indirect evidence that there   00:10:01.020 --> 00:10:06.420 could be one construct underlying driving  the correlations between these indicators. 00:10:06.420 --> 00:10:14.580 Another thing that we look at from these factor  loadings is their magnitude. So that's what we do   00:10:14.580 --> 00:10:21.840 when we assess the results. And this is an example  from Yli-Renko's article. They have a table of   00:10:21.840 --> 00:10:27.930 factor loadings. They have the measurement  items. They have labeled the factors. So   00:10:27.930 --> 00:10:33.900 usually you label the factors with the constructs  names and then then you look at the loading. 00:10:33.900 --> 00:10:41.400 So the factor loadings here are interpreted as  evidence of reliability. So the square of factor   00:10:41.400 --> 00:10:47.220 loading is an estimate of the reliability of the  indicator and then we also have these statistics   00:10:47.220 --> 00:10:53.880 - z-statistic that is used for testing the  significance whether the loading is zero or not.   00:10:53.880 --> 00:11:01.350 I don't think the null hypothesis is that loading  is zero is very relevant. So you want to really   00:11:01.350 --> 00:11:09.120 know whether the indicators are reliable enough -  not whether their reliability differs from zero. 00:11:09.120 --> 00:11:15.240 So this is not a very useful test but  people still sometimes presented it. The   00:11:15.240 --> 00:11:19.320 first indicator here is not tested.  The reason for this is that this is   00:11:19.320 --> 00:11:22.500 from a converter factor analysis  and there's a technical reason   00:11:22.500 --> 00:11:28.350 why the first indicator is not not tested  here. I'll explain that in another video. 00:11:28.350 --> 00:11:35.940 Then the authors say that the standardized  loadings are all about 0.57 and the cutoff   00:11:35.940 --> 00:11:42.270 is 0.4. The commonly used cutoff is 0.7  but you can probably find somebody who has   00:11:42.270 --> 00:11:48.090 presented a lower cutoff if you do that kind of  cherry picking but really normally we want the   00:11:48.090 --> 00:11:54.690 loading to be 0.47 but reliability again is  a matter of degree it's not a matter of yes   00:11:54.690 --> 00:12:00.750 or no and you have to then assess what the  unreliability means for your study results.