WEBVTT WEBVTT Kind: captions Language: en 00:00:00.030 --> 00:00:04.290 Factor analysis extracts underlying dimensions from the data and answers 00:00:04.290 --> 00:00:06.960 the question what indicators have in common. 00:00:06.960 --> 00:00:11.790 Sometimes factor analysis nevertheless doesn't give you the solution that you 00:00:11.790 --> 00:00:17.940 expect and then you have to understand why that would happen. To do so you have 00:00:17.940 --> 00:00:21.450 to understand what exactly the factor analysis is doing. And 00:00:21.450 --> 00:00:26.610 in this video I will provide a conceptual explanation of exploratory factor analysis. 00:00:26.610 --> 00:00:31.020 The idea of a factor analysis is that there are different variance components 00:00:31.020 --> 00:00:37.350 in the data. Typically there is... If we have a measurement occasion there is variance caused 00:00:37.350 --> 00:00:45.000 by the construct. So we have indicators a1 a2 and a3 that are supposedly valid measures 00:00:45.000 --> 00:00:53.190 of construct A. And we have b1 b2 and b3 that are supposedly valid measures of construct B. 00:00:53.190 --> 00:01:00.180 Then each indicator also has this random noise unreliability and some 00:01:00.180 --> 00:01:04.350 unique aspects. So if these are survey questions - then the survey questions 00:01:04.350 --> 00:01:09.090 measure the construct - they could measure something else and then there's unreliability. 00:01:09.090 --> 00:01:16.890 In factor analysis we add a latent - one or more latent variables - to this model. 00:01:16.890 --> 00:01:20.670 So these are observed variables and we try to explain the correlation between 00:01:20.670 --> 00:01:24.480 the observed variables by using a smaller number of latent variables. 00:01:24.480 --> 00:01:31.740 For example we add one factor here that we think explains the inter correlations between these 00:01:31.740 --> 00:01:41.250 items. And there were two strategies: exploratory analysis where we allow the computer to specify 00:01:41.250 --> 00:01:45.630 the factors and confirmatory analysis where we specify the factor structure ourselves. 00:01:45.630 --> 00:01:51.300 The factor analysis model also - it's a statistical model - so it's a set 00:01:51.300 --> 00:01:58.590 of equations and here is the model. So we are saying that all these indicators 00:01:58.590 --> 00:02:05.880 a1 a2 a3 b1 b2 b3 are function of the factor times factor loading for which 00:02:05.880 --> 00:02:09.870 we use the Greek letter lambda plus some error that we don't observe. 00:02:09.870 --> 00:02:15.900 So it's a regression equation basically. The only difference is that we only observe the 00:02:15.900 --> 00:02:19.650 dependent variable. We don't observe the key independent variable. So this 00:02:19.650 --> 00:02:25.200 is a latent variable. If it was observed variable then we could just regress all 00:02:25.200 --> 00:02:29.100 indicators on the factor but we can't because the factor is not observed. 00:02:29.100 --> 00:02:34.440 So these were the factor loadings and these were the item uniqueness. It's 00:02:34.440 --> 00:02:41.130 important to note that factor analysis cannot separate unreliability from some other unique 00:02:41.130 --> 00:02:47.670 variance. So if the a1 indicator has some unique aspects Q then you cannot separate 00:02:47.670 --> 00:02:55.200 it from unreliability. And that's - basically with any reliability statistic this applies. 00:02:55.200 --> 00:03:01.530 So if your indicators have variation that is unique from other indicators but still reliable 00:03:01.530 --> 00:03:09.180 so it's not random noise it's some variation - then it cannot be distinguished from reliability. 00:03:09.180 --> 00:03:15.090 We assume that all variants that can be explained by the other items or unique 00:03:15.090 --> 00:03:19.830 variance is unreliability. So that's the workaround for that limitation. 00:03:19.830 --> 00:03:24.420 So we had exploratory analysis and confirmatory analysis. The idea of 00:03:24.420 --> 00:03:30.150 exploratory factor analysis is that the computer first gives - tries to explain 00:03:30.150 --> 00:03:35.250 the data with one factor. So it estimates one factor model - one factor explains all 00:03:35.250 --> 00:03:42.840 correlations between the indicators. Then we eliminate the variance explained by that factor 00:03:42.840 --> 00:03:49.320 from the data and then we fit the same single factor model again on the residual variance 00:03:49.320 --> 00:03:54.960 and we repeat this until there is no more covariance between indicators to explain. 00:03:54.960 --> 00:04:00.570 So what does the process look like? We have the data here. So we have this A 00:04:00.570 --> 00:04:05.790 variance here. Do they construct? B variants - do they construct? And we want to know how 00:04:05.790 --> 00:04:09.960 much of the variation of these indicators are due to the A construct and the B construct. 00:04:09.960 --> 00:04:17.580 We first fit a single factor model- And let's say that the single factor now picks up all 00:04:17.580 --> 00:04:24.420 the A variants. So all the A variants go to the factor and the remaining variance will go to the 00:04:24.420 --> 00:04:30.120 error term. So we take a part - the variation in the observed variables - we assain some to 00:04:30.120 --> 00:04:35.640 the factors and some to the error terms. And this model doesn't fit really well because 00:04:35.640 --> 00:04:39.840 these errors are assumed to be uncorrelated but we can see here that because this error 00:04:39.840 --> 00:04:44.460 term takes the B variance and this as well they are actually correlated. So the factor analysis 00:04:44.460 --> 00:04:50.970 wouldn't stop here because there is evidence that there's still correlations after this factor. 00:04:50.970 --> 00:05:00.300 So we take the A variance we put it aside here and then we - from the remaining data we fit 00:05:00.300 --> 00:05:07.830 another factor. It picks up the B variance here and the B variance here and then the 00:05:07.830 --> 00:05:13.860 remaining indicators here are uncorrelated in which case the factor analysis stops. 00:05:13.860 --> 00:05:19.440 So that's how factor analysis works. We pick up some variation then we continue with the 00:05:19.440 --> 00:05:24.330 remaining we figure some variation until all the remaining indicators are uncorrelated in 00:05:24.330 --> 00:05:29.670 which case the factor analysis have discovered two factors. These two 00:05:29.670 --> 00:05:36.150 factors explain the inter correlation with the variables completely. The remaining variance 00:05:36.150 --> 00:05:40.890 in the data are simply unique features of these indicators of unreliability. 00:05:40.890 --> 00:05:45.780 So that's the conceptual idea. We extract variation then we do 00:05:45.780 --> 00:05:49.770 it over and over until there's no more covariances to extract.