WEBVTT
WEBVTT
Kind: captions
Language: en
00:00:00.030 --> 00:00:04.290
Factor analysis extracts underlying
dimensions from the data and answers
00:00:04.290 --> 00:00:06.960
the question what indicators have in common.
00:00:06.960 --> 00:00:11.790
Sometimes factor analysis nevertheless
doesn't give you the solution that you
00:00:11.790 --> 00:00:17.940
expect and then you have to understand
why that would happen. To do so you have
00:00:17.940 --> 00:00:21.450
to understand what exactly the
factor analysis is doing. And
00:00:21.450 --> 00:00:26.610
in this video I will provide a conceptual
explanation of exploratory factor analysis.
00:00:26.610 --> 00:00:31.020
The idea of a factor analysis is that
there are different variance components
00:00:31.020 --> 00:00:37.350
in the data. Typically there is... If we have
a measurement occasion there is variance caused
00:00:37.350 --> 00:00:45.000
by the construct. So we have indicators a1
a2 and a3 that are supposedly valid measures
00:00:45.000 --> 00:00:53.190
of construct A. And we have b1 b2 and b3 that
are supposedly valid measures of construct B.
00:00:53.190 --> 00:01:00.180
Then each indicator also has this
random noise unreliability and some
00:01:00.180 --> 00:01:04.350
unique aspects. So if these are survey
questions - then the survey questions
00:01:04.350 --> 00:01:09.090
measure the construct - they could measure
something else and then there's unreliability.
00:01:09.090 --> 00:01:16.890
In factor analysis we add a latent - one
or more latent variables - to this model.
00:01:16.890 --> 00:01:20.670
So these are observed variables and we
try to explain the correlation between
00:01:20.670 --> 00:01:24.480
the observed variables by using a
smaller number of latent variables.
00:01:24.480 --> 00:01:31.740
For example we add one factor here that we think
explains the inter correlations between these
00:01:31.740 --> 00:01:41.250
items. And there were two strategies: exploratory
analysis where we allow the computer to specify
00:01:41.250 --> 00:01:45.630
the factors and confirmatory analysis where
we specify the factor structure ourselves.
00:01:45.630 --> 00:01:51.300
The factor analysis model also - it's
a statistical model - so it's a set
00:01:51.300 --> 00:01:58.590
of equations and here is the model. So
we are saying that all these indicators
00:01:58.590 --> 00:02:05.880
a1 a2 a3 b1 b2 b3 are function of the
factor times factor loading for which
00:02:05.880 --> 00:02:09.870
we use the Greek letter lambda plus
some error that we don't observe.
00:02:09.870 --> 00:02:15.900
So it's a regression equation basically. The
only difference is that we only observe the
00:02:15.900 --> 00:02:19.650
dependent variable. We don't observe
the key independent variable. So this
00:02:19.650 --> 00:02:25.200
is a latent variable. If it was observed
variable then we could just regress all
00:02:25.200 --> 00:02:29.100
indicators on the factor but we can't
because the factor is not observed.
00:02:29.100 --> 00:02:34.440
So these were the factor loadings and
these were the item uniqueness. It's
00:02:34.440 --> 00:02:41.130
important to note that factor analysis cannot
separate unreliability from some other unique
00:02:41.130 --> 00:02:47.670
variance. So if the a1 indicator has some
unique aspects Q then you cannot separate
00:02:47.670 --> 00:02:55.200
it from unreliability. And that's - basically
with any reliability statistic this applies.
00:02:55.200 --> 00:03:01.530
So if your indicators have variation that is
unique from other indicators but still reliable
00:03:01.530 --> 00:03:09.180
so it's not random noise it's some variation -
then it cannot be distinguished from reliability.
00:03:09.180 --> 00:03:15.090
We assume that all variants that can be
explained by the other items or unique
00:03:15.090 --> 00:03:19.830
variance is unreliability. So that's
the workaround for that limitation.
00:03:19.830 --> 00:03:24.420
So we had exploratory analysis and
confirmatory analysis. The idea of
00:03:24.420 --> 00:03:30.150
exploratory factor analysis is that the
computer first gives - tries to explain
00:03:30.150 --> 00:03:35.250
the data with one factor. So it estimates
one factor model - one factor explains all
00:03:35.250 --> 00:03:42.840
correlations between the indicators. Then we
eliminate the variance explained by that factor
00:03:42.840 --> 00:03:49.320
from the data and then we fit the same single
factor model again on the residual variance
00:03:49.320 --> 00:03:54.960
and we repeat this until there is no more
covariance between indicators to explain.
00:03:54.960 --> 00:04:00.570
So what does the process look like? We
have the data here. So we have this A
00:04:00.570 --> 00:04:05.790
variance here. Do they construct? B variants
- do they construct? And we want to know how
00:04:05.790 --> 00:04:09.960
much of the variation of these indicators are
due to the A construct and the B construct.
00:04:09.960 --> 00:04:17.580
We first fit a single factor model- And let's
say that the single factor now picks up all
00:04:17.580 --> 00:04:24.420
the A variants. So all the A variants go to the
factor and the remaining variance will go to the
00:04:24.420 --> 00:04:30.120
error term. So we take a part - the variation
in the observed variables - we assain some to
00:04:30.120 --> 00:04:35.640
the factors and some to the error terms. And
this model doesn't fit really well because
00:04:35.640 --> 00:04:39.840
these errors are assumed to be uncorrelated
but we can see here that because this error
00:04:39.840 --> 00:04:44.460
term takes the B variance and this as well they
are actually correlated. So the factor analysis
00:04:44.460 --> 00:04:50.970
wouldn't stop here because there is evidence that
there's still correlations after this factor.
00:04:50.970 --> 00:05:00.300
So we take the A variance we put it aside here
and then we - from the remaining data we fit
00:05:00.300 --> 00:05:07.830
another factor. It picks up the B variance
here and the B variance here and then the
00:05:07.830 --> 00:05:13.860
remaining indicators here are uncorrelated
in which case the factor analysis stops.
00:05:13.860 --> 00:05:19.440
So that's how factor analysis works. We pick
up some variation then we continue with the
00:05:19.440 --> 00:05:24.330
remaining we figure some variation until all
the remaining indicators are uncorrelated in
00:05:24.330 --> 00:05:29.670
which case the factor analysis have
discovered two factors. These two
00:05:29.670 --> 00:05:36.150
factors explain the inter correlation with the
variables completely. The remaining variance
00:05:36.150 --> 00:05:40.890
in the data are simply unique features
of these indicators of unreliability.
00:05:40.890 --> 00:05:45.780
So that's the conceptual idea.
We extract variation then we do
00:05:45.780 --> 00:05:49.770
it over and over until there's
no more covariances to extract.