WEBVTT

WEBVTT
Kind: captions
Language: en

00:00:00.030 --> 00:00:04.290
Factor analysis extracts underlying&nbsp;
dimensions from the data and answers&nbsp;&nbsp;

00:00:04.290 --> 00:00:06.960
the question what indicators have in common.

00:00:06.960 --> 00:00:11.790
Sometimes factor analysis nevertheless&nbsp;
doesn't give you the solution that you&nbsp;&nbsp;

00:00:11.790 --> 00:00:17.940
expect and then you have to understand&nbsp;
why that would happen. To do so you have&nbsp;&nbsp;

00:00:17.940 --> 00:00:21.450
to understand what exactly the&nbsp;
factor analysis is doing. And&nbsp;&nbsp;

00:00:21.450 --> 00:00:26.610
in this video I will provide a conceptual&nbsp;
explanation of exploratory factor analysis.

00:00:26.610 --> 00:00:31.020
The idea of a factor analysis is that&nbsp;
there are different variance components&nbsp;&nbsp;

00:00:31.020 --> 00:00:37.350
in the data. Typically there is... If we have&nbsp;
a measurement occasion there is variance caused&nbsp;&nbsp;

00:00:37.350 --> 00:00:45.000
by the construct. So we have indicators a1&nbsp;
a2 and a3 that are supposedly valid measures&nbsp;&nbsp;

00:00:45.000 --> 00:00:53.190
of construct A. And we have b1 b2 and b3 that&nbsp;
are supposedly valid measures of construct B.

00:00:53.190 --> 00:01:00.180
Then each indicator also has this&nbsp;
random noise unreliability and some&nbsp;&nbsp;

00:01:00.180 --> 00:01:04.350
unique aspects. So if these are survey&nbsp;
questions - then the survey questions&nbsp;&nbsp;

00:01:04.350 --> 00:01:09.090
measure the construct - they could measure&nbsp;
something else and then there's unreliability.

00:01:09.090 --> 00:01:16.890
In factor analysis we add a latent - one&nbsp;
or more latent variables - to this model.&nbsp;&nbsp;

00:01:16.890 --> 00:01:20.670
So these are observed variables and we&nbsp;
try to explain the correlation between&nbsp;&nbsp;

00:01:20.670 --> 00:01:24.480
the observed variables by using a&nbsp;
smaller number of latent variables.

00:01:24.480 --> 00:01:31.740
For example we add one factor here that we think&nbsp;
explains the inter correlations between these&nbsp;&nbsp;

00:01:31.740 --> 00:01:41.250
items. And there were two strategies: exploratory&nbsp;
analysis where we allow the computer to specify&nbsp;&nbsp;

00:01:41.250 --> 00:01:45.630
the factors and confirmatory analysis where&nbsp;
we specify the factor structure ourselves.

00:01:45.630 --> 00:01:51.300
The factor analysis model also - it's&nbsp;
a statistical model - so it's a set&nbsp;&nbsp;

00:01:51.300 --> 00:01:58.590
of equations and here is the model. So&nbsp;
we are saying that all these indicators&nbsp;&nbsp;

00:01:58.590 --> 00:02:05.880
a1 a2 a3 b1 b2 b3 are function of the&nbsp;
factor times factor loading for which&nbsp;&nbsp;

00:02:05.880 --> 00:02:09.870
we use the Greek letter lambda plus&nbsp;
some error that we don't observe.

00:02:09.870 --> 00:02:15.900
So it's a regression equation basically. The&nbsp;
only difference is that we only observe the&nbsp;&nbsp;

00:02:15.900 --> 00:02:19.650
dependent variable. We don't observe&nbsp;
the key independent variable. So this&nbsp;&nbsp;

00:02:19.650 --> 00:02:25.200
is a latent variable. If it was observed&nbsp;
variable then we could just regress all&nbsp;&nbsp;

00:02:25.200 --> 00:02:29.100
indicators on the factor but we can't&nbsp;
because the factor is not observed.

00:02:29.100 --> 00:02:34.440
So these were the factor loadings and&nbsp;
these were the item uniqueness. It's&nbsp;&nbsp;

00:02:34.440 --> 00:02:41.130
important to note that factor analysis cannot&nbsp;
separate unreliability from some other unique&nbsp;&nbsp;

00:02:41.130 --> 00:02:47.670
variance. So if the a1 indicator has some&nbsp;
unique aspects Q then you cannot separate&nbsp;&nbsp;

00:02:47.670 --> 00:02:55.200
it from unreliability. And that's - basically&nbsp;
with any reliability statistic this applies.

00:02:55.200 --> 00:03:01.530
So if your indicators have variation that is&nbsp;
unique from other indicators but still reliable&nbsp;&nbsp;

00:03:01.530 --> 00:03:09.180
so it's not random noise it's some variation -&nbsp;
then it cannot be distinguished from reliability.

00:03:09.180 --> 00:03:15.090
We assume that all variants that can be&nbsp;
explained by the other items or unique&nbsp;&nbsp;

00:03:15.090 --> 00:03:19.830
variance is unreliability. So that's&nbsp;
the workaround for that limitation.

00:03:19.830 --> 00:03:24.420
So we had exploratory analysis and&nbsp;
confirmatory analysis. The idea of&nbsp;&nbsp;

00:03:24.420 --> 00:03:30.150
exploratory factor analysis is that the&nbsp;
computer first gives - tries to explain&nbsp;&nbsp;

00:03:30.150 --> 00:03:35.250
the data with one factor. So it estimates&nbsp;
one factor model - one factor explains all&nbsp;&nbsp;

00:03:35.250 --> 00:03:42.840
correlations between the indicators. Then we&nbsp;
eliminate the variance explained by that factor&nbsp;&nbsp;

00:03:42.840 --> 00:03:49.320
from the data and then we fit the same single&nbsp;
factor model again on the residual variance&nbsp;&nbsp;

00:03:49.320 --> 00:03:54.960
and we repeat this until there is no more&nbsp;
covariance between indicators to explain.

00:03:54.960 --> 00:04:00.570
So what does the process look like? We&nbsp;
have the data here. So we have this A&nbsp;&nbsp;

00:04:00.570 --> 00:04:05.790
variance here. Do they construct? B variants&nbsp;
- do they construct? And we want to know how&nbsp;&nbsp;

00:04:05.790 --> 00:04:09.960
much of the variation of these indicators are&nbsp;
due to the A construct and the B construct.

00:04:09.960 --> 00:04:17.580
We first fit a single factor model- And let's&nbsp;
say that the single factor now picks up all&nbsp;&nbsp;

00:04:17.580 --> 00:04:24.420
the A variants. So all the A variants go to the&nbsp;
factor and the remaining variance will go to the&nbsp;&nbsp;

00:04:24.420 --> 00:04:30.120
error term. So we take a part - the variation&nbsp;
in the observed variables - we assain some to&nbsp;&nbsp;

00:04:30.120 --> 00:04:35.640
the factors and some to the error terms. And&nbsp;
this model doesn't fit really well because&nbsp;&nbsp;

00:04:35.640 --> 00:04:39.840
these errors are assumed to be uncorrelated&nbsp;
but we can see here that because this error&nbsp;&nbsp;

00:04:39.840 --> 00:04:44.460
term takes the B variance and this as well they&nbsp;
are actually correlated. So the factor analysis&nbsp;&nbsp;

00:04:44.460 --> 00:04:50.970
wouldn't stop here because there is evidence that&nbsp;
there's still correlations after this factor.

00:04:50.970 --> 00:05:00.300
So we take the A variance we put it aside here&nbsp;
and then we - from the remaining data we fit&nbsp;&nbsp;

00:05:00.300 --> 00:05:07.830
another factor. It picks up the B variance&nbsp;
here and the B variance here and then the&nbsp;&nbsp;

00:05:07.830 --> 00:05:13.860
remaining indicators here are uncorrelated&nbsp;
in which case the factor analysis stops.

00:05:13.860 --> 00:05:19.440
So that's how factor analysis works. We pick&nbsp;
up some variation then we continue with the&nbsp;&nbsp;

00:05:19.440 --> 00:05:24.330
remaining we figure some variation until all&nbsp;
the remaining indicators are uncorrelated in&nbsp;&nbsp;

00:05:24.330 --> 00:05:29.670
which case the factor analysis have&nbsp;
discovered two factors. These two&nbsp;&nbsp;

00:05:29.670 --> 00:05:36.150
factors explain the inter correlation with the&nbsp;
variables completely. The remaining variance&nbsp;&nbsp;

00:05:36.150 --> 00:05:40.890
in the data are simply unique features&nbsp;
of these indicators of unreliability.

00:05:40.890 --> 00:05:45.780
So that's the conceptual idea.&nbsp;
We extract variation then we do&nbsp;&nbsp;

00:05:45.780 --> 00:05:49.770
it over and over until there's&nbsp;
no more covariances to extract.