WEBVTT WEBVTT Kind: captions Language: en 00:00:00.030 --> 00:00:04.290 Factor analysis extracts underlying  dimensions from the data and answers   00:00:04.290 --> 00:00:06.960 the question what indicators have in common. 00:00:06.960 --> 00:00:11.790 Sometimes factor analysis nevertheless  doesn't give you the solution that you   00:00:11.790 --> 00:00:17.940 expect and then you have to understand  why that would happen. To do so you have   00:00:17.940 --> 00:00:21.450 to understand what exactly the  factor analysis is doing. And   00:00:21.450 --> 00:00:26.610 in this video I will provide a conceptual  explanation of exploratory factor analysis. 00:00:26.610 --> 00:00:31.020 The idea of a factor analysis is that  there are different variance components   00:00:31.020 --> 00:00:37.350 in the data. Typically there is... If we have  a measurement occasion there is variance caused   00:00:37.350 --> 00:00:45.000 by the construct. So we have indicators a1  a2 and a3 that are supposedly valid measures   00:00:45.000 --> 00:00:53.190 of construct A. And we have b1 b2 and b3 that  are supposedly valid measures of construct B. 00:00:53.190 --> 00:01:00.180 Then each indicator also has this  random noise unreliability and some   00:01:00.180 --> 00:01:04.350 unique aspects. So if these are survey  questions - then the survey questions   00:01:04.350 --> 00:01:09.090 measure the construct - they could measure  something else and then there's unreliability. 00:01:09.090 --> 00:01:16.890 In factor analysis we add a latent - one  or more latent variables - to this model.   00:01:16.890 --> 00:01:20.670 So these are observed variables and we  try to explain the correlation between   00:01:20.670 --> 00:01:24.480 the observed variables by using a  smaller number of latent variables. 00:01:24.480 --> 00:01:31.740 For example we add one factor here that we think  explains the inter correlations between these   00:01:31.740 --> 00:01:41.250 items. And there were two strategies: exploratory  analysis where we allow the computer to specify   00:01:41.250 --> 00:01:45.630 the factors and confirmatory analysis where  we specify the factor structure ourselves. 00:01:45.630 --> 00:01:51.300 The factor analysis model also - it's  a statistical model - so it's a set   00:01:51.300 --> 00:01:58.590 of equations and here is the model. So  we are saying that all these indicators   00:01:58.590 --> 00:02:05.880 a1 a2 a3 b1 b2 b3 are function of the  factor times factor loading for which   00:02:05.880 --> 00:02:09.870 we use the Greek letter lambda plus  some error that we don't observe. 00:02:09.870 --> 00:02:15.900 So it's a regression equation basically. The  only difference is that we only observe the   00:02:15.900 --> 00:02:19.650 dependent variable. We don't observe  the key independent variable. So this   00:02:19.650 --> 00:02:25.200 is a latent variable. If it was observed  variable then we could just regress all   00:02:25.200 --> 00:02:29.100 indicators on the factor but we can't  because the factor is not observed. 00:02:29.100 --> 00:02:34.440 So these were the factor loadings and  these were the item uniqueness. It's   00:02:34.440 --> 00:02:41.130 important to note that factor analysis cannot  separate unreliability from some other unique   00:02:41.130 --> 00:02:47.670 variance. So if the a1 indicator has some  unique aspects Q then you cannot separate   00:02:47.670 --> 00:02:55.200 it from unreliability. And that's - basically  with any reliability statistic this applies. 00:02:55.200 --> 00:03:01.530 So if your indicators have variation that is  unique from other indicators but still reliable   00:03:01.530 --> 00:03:09.180 so it's not random noise it's some variation -  then it cannot be distinguished from reliability. 00:03:09.180 --> 00:03:15.090 We assume that all variants that can be  explained by the other items or unique   00:03:15.090 --> 00:03:19.830 variance is unreliability. So that's  the workaround for that limitation. 00:03:19.830 --> 00:03:24.420 So we had exploratory analysis and  confirmatory analysis. The idea of   00:03:24.420 --> 00:03:30.150 exploratory factor analysis is that the  computer first gives - tries to explain   00:03:30.150 --> 00:03:35.250 the data with one factor. So it estimates  one factor model - one factor explains all   00:03:35.250 --> 00:03:42.840 correlations between the indicators. Then we  eliminate the variance explained by that factor   00:03:42.840 --> 00:03:49.320 from the data and then we fit the same single  factor model again on the residual variance   00:03:49.320 --> 00:03:54.960 and we repeat this until there is no more  covariance between indicators to explain. 00:03:54.960 --> 00:04:00.570 So what does the process look like? We  have the data here. So we have this A   00:04:00.570 --> 00:04:05.790 variance here. Do they construct? B variants  - do they construct? And we want to know how   00:04:05.790 --> 00:04:09.960 much of the variation of these indicators are  due to the A construct and the B construct. 00:04:09.960 --> 00:04:17.580 We first fit a single factor model- And let's  say that the single factor now picks up all   00:04:17.580 --> 00:04:24.420 the A variants. So all the A variants go to the  factor and the remaining variance will go to the   00:04:24.420 --> 00:04:30.120 error term. So we take a part - the variation  in the observed variables - we assain some to   00:04:30.120 --> 00:04:35.640 the factors and some to the error terms. And  this model doesn't fit really well because   00:04:35.640 --> 00:04:39.840 these errors are assumed to be uncorrelated  but we can see here that because this error   00:04:39.840 --> 00:04:44.460 term takes the B variance and this as well they  are actually correlated. So the factor analysis   00:04:44.460 --> 00:04:50.970 wouldn't stop here because there is evidence that  there's still correlations after this factor. 00:04:50.970 --> 00:05:00.300 So we take the A variance we put it aside here  and then we - from the remaining data we fit   00:05:00.300 --> 00:05:07.830 another factor. It picks up the B variance  here and the B variance here and then the   00:05:07.830 --> 00:05:13.860 remaining indicators here are uncorrelated  in which case the factor analysis stops. 00:05:13.860 --> 00:05:19.440 So that's how factor analysis works. We pick  up some variation then we continue with the   00:05:19.440 --> 00:05:24.330 remaining we figure some variation until all  the remaining indicators are uncorrelated in   00:05:24.330 --> 00:05:29.670 which case the factor analysis have  discovered two factors. These two   00:05:29.670 --> 00:05:36.150 factors explain the inter correlation with the  variables completely. The remaining variance   00:05:36.150 --> 00:05:40.890 in the data are simply unique features  of these indicators of unreliability. 00:05:40.890 --> 00:05:45.780 So that's the conceptual idea.  We extract variation then we do   00:05:45.780 --> 00:05:49.770 it over and over until there's  no more covariances to extract.