WEBVTT

WEBVTT
Kind: captions
Language: en

00:00:00.030 --> 00:00:04.290
Factor analysis extracts underlying 
dimensions from the data and answers  

00:00:04.290 --> 00:00:06.960
the question what indicators have in common.

00:00:06.960 --> 00:00:11.790
Sometimes factor analysis nevertheless 
doesn't give you the solution that you  

00:00:11.790 --> 00:00:17.940
expect and then you have to understand 
why that would happen. To do so you have  

00:00:17.940 --> 00:00:21.450
to understand what exactly the 
factor analysis is doing. And  

00:00:21.450 --> 00:00:26.610
in this video I will provide a conceptual 
explanation of exploratory factor analysis.

00:00:26.610 --> 00:00:31.020
The idea of a factor analysis is that 
there are different variance components  

00:00:31.020 --> 00:00:37.350
in the data. Typically there is... If we have 
a measurement occasion there is variance caused  

00:00:37.350 --> 00:00:45.000
by the construct. So we have indicators a1 
a2 and a3 that are supposedly valid measures  

00:00:45.000 --> 00:00:53.190
of construct A. And we have b1 b2 and b3 that 
are supposedly valid measures of construct B.

00:00:53.190 --> 00:01:00.180
Then each indicator also has this 
random noise unreliability and some  

00:01:00.180 --> 00:01:04.350
unique aspects. So if these are survey 
questions - then the survey questions  

00:01:04.350 --> 00:01:09.090
measure the construct - they could measure 
something else and then there's unreliability.

00:01:09.090 --> 00:01:16.890
In factor analysis we add a latent - one 
or more latent variables - to this model.  

00:01:16.890 --> 00:01:20.670
So these are observed variables and we 
try to explain the correlation between  

00:01:20.670 --> 00:01:24.480
the observed variables by using a 
smaller number of latent variables.

00:01:24.480 --> 00:01:31.740
For example we add one factor here that we think 
explains the inter correlations between these  

00:01:31.740 --> 00:01:41.250
items. And there were two strategies: exploratory 
analysis where we allow the computer to specify  

00:01:41.250 --> 00:01:45.630
the factors and confirmatory analysis where 
we specify the factor structure ourselves.

00:01:45.630 --> 00:01:51.300
The factor analysis model also - it's 
a statistical model - so it's a set  

00:01:51.300 --> 00:01:58.590
of equations and here is the model. So 
we are saying that all these indicators  

00:01:58.590 --> 00:02:05.880
a1 a2 a3 b1 b2 b3 are function of the 
factor times factor loading for which  

00:02:05.880 --> 00:02:09.870
we use the Greek letter lambda plus 
some error that we don't observe.

00:02:09.870 --> 00:02:15.900
So it's a regression equation basically. The 
only difference is that we only observe the  

00:02:15.900 --> 00:02:19.650
dependent variable. We don't observe 
the key independent variable. So this  

00:02:19.650 --> 00:02:25.200
is a latent variable. If it was observed 
variable then we could just regress all  

00:02:25.200 --> 00:02:29.100
indicators on the factor but we can't 
because the factor is not observed.

00:02:29.100 --> 00:02:34.440
So these were the factor loadings and 
these were the item uniqueness. It's  

00:02:34.440 --> 00:02:41.130
important to note that factor analysis cannot 
separate unreliability from some other unique  

00:02:41.130 --> 00:02:47.670
variance. So if the a1 indicator has some 
unique aspects Q then you cannot separate  

00:02:47.670 --> 00:02:55.200
it from unreliability. And that's - basically 
with any reliability statistic this applies.

00:02:55.200 --> 00:03:01.530
So if your indicators have variation that is 
unique from other indicators but still reliable  

00:03:01.530 --> 00:03:09.180
so it's not random noise it's some variation - 
then it cannot be distinguished from reliability.

00:03:09.180 --> 00:03:15.090
We assume that all variants that can be 
explained by the other items or unique  

00:03:15.090 --> 00:03:19.830
variance is unreliability. So that's 
the workaround for that limitation.

00:03:19.830 --> 00:03:24.420
So we had exploratory analysis and 
confirmatory analysis. The idea of  

00:03:24.420 --> 00:03:30.150
exploratory factor analysis is that the 
computer first gives - tries to explain  

00:03:30.150 --> 00:03:35.250
the data with one factor. So it estimates 
one factor model - one factor explains all  

00:03:35.250 --> 00:03:42.840
correlations between the indicators. Then we 
eliminate the variance explained by that factor  

00:03:42.840 --> 00:03:49.320
from the data and then we fit the same single 
factor model again on the residual variance  

00:03:49.320 --> 00:03:54.960
and we repeat this until there is no more 
covariance between indicators to explain.

00:03:54.960 --> 00:04:00.570
So what does the process look like? We 
have the data here. So we have this A  

00:04:00.570 --> 00:04:05.790
variance here. Do they construct? B variants 
- do they construct? And we want to know how  

00:04:05.790 --> 00:04:09.960
much of the variation of these indicators are 
due to the A construct and the B construct.

00:04:09.960 --> 00:04:17.580
We first fit a single factor model- And let's 
say that the single factor now picks up all  

00:04:17.580 --> 00:04:24.420
the A variants. So all the A variants go to the 
factor and the remaining variance will go to the  

00:04:24.420 --> 00:04:30.120
error term. So we take a part - the variation 
in the observed variables - we assain some to  

00:04:30.120 --> 00:04:35.640
the factors and some to the error terms. And 
this model doesn't fit really well because  

00:04:35.640 --> 00:04:39.840
these errors are assumed to be uncorrelated 
but we can see here that because this error  

00:04:39.840 --> 00:04:44.460
term takes the B variance and this as well they 
are actually correlated. So the factor analysis  

00:04:44.460 --> 00:04:50.970
wouldn't stop here because there is evidence that 
there's still correlations after this factor.

00:04:50.970 --> 00:05:00.300
So we take the A variance we put it aside here 
and then we - from the remaining data we fit  

00:05:00.300 --> 00:05:07.830
another factor. It picks up the B variance 
here and the B variance here and then the  

00:05:07.830 --> 00:05:13.860
remaining indicators here are uncorrelated 
in which case the factor analysis stops.

00:05:13.860 --> 00:05:19.440
So that's how factor analysis works. We pick 
up some variation then we continue with the  

00:05:19.440 --> 00:05:24.330
remaining we figure some variation until all 
the remaining indicators are uncorrelated in  

00:05:24.330 --> 00:05:29.670
which case the factor analysis have 
discovered two factors. These two  

00:05:29.670 --> 00:05:36.150
factors explain the inter correlation with the 
variables completely. The remaining variance  

00:05:36.150 --> 00:05:40.890
in the data are simply unique features 
of these indicators of unreliability.

00:05:40.890 --> 00:05:45.780
So that's the conceptual idea. 
We extract variation then we do  

00:05:45.780 --> 00:05:49.770
it over and over until there's 
no more covariances to extract.