WEBVTT

WEBVTT
Kind: captions
Language: en

00:00:00.030 --> 00:00:02.700
The biggest problem in the&nbsp;
formative measurement idea is&nbsp;&nbsp;

00:00:02.700 --> 00:00:05.190
the idea that the indicators cause the construct.

00:00:05.190 --> 00:00:10.710
There are also statistical issues in&nbsp;
how these models are specified and how&nbsp;&nbsp;

00:00:10.710 --> 00:00:14.940
particularly the models are identified. I&nbsp;
will explain a couple of these issues in&nbsp;&nbsp;

00:00:14.940 --> 00:00:19.290
this video. There are a couple more but they&nbsp;
are not as important as these two issues.

00:00:19.290 --> 00:00:27.540
The root of the problem is that a formative model&nbsp;
- where we specify this latent variable as a&nbsp;&nbsp;

00:00:27.540 --> 00:00:33.390
function of these observed variables in three in&nbsp;
this example and this unobserved error term - is&nbsp;&nbsp;

00:00:33.390 --> 00:00:39.030
not identified in itself. It's like a regression&nbsp;
analysis without the dependent variable basically.

00:00:39.030 --> 00:00:45.270
It's not identified because these correlations&nbsp;
within these three indicators are free and that&nbsp;&nbsp;

00:00:45.270 --> 00:00:50.820
consues all degrees of freedom and we don't&nbsp;
have any more information for estimating&nbsp;&nbsp;

00:00:50.820 --> 00:00:55.230
these paths or the variance of their error&nbsp;
term. So the degrees of freedom is negative.

00:00:55.230 --> 00:01:02.910
There are a couple of ways around this problem.&nbsp;
The most commonly recommended way is that we add&nbsp;&nbsp;

00:01:02.910 --> 00:01:09.180
two normal indicators. The literature informative&nbsp;
measurement calls these reflective indicators.

00:01:09.180 --> 00:01:16.470
So we specify that this latent variable&nbsp;
here actually is a common factor for these&nbsp;&nbsp;

00:01:16.470 --> 00:01:22.050
two measurements and these measurements are&nbsp;
added there for identification of the model.

00:01:22.050 --> 00:01:29.310
So this leads to an interesting problem&nbsp;
and the problem is actually that this&nbsp;&nbsp;

00:01:29.310 --> 00:01:37.320
latent variable here is now defined by these&nbsp;
two normal indicators instead of these three&nbsp;&nbsp;

00:01:37.320 --> 00:01:43.890
formative or causal indicators. So these&nbsp;
factors - these indicators these measure&nbsp;&nbsp;

00:01:43.890 --> 00:01:49.320
one and measure two actually give this&nbsp;
latent variable its identity and mean.

00:01:49.320 --> 00:01:55.440
So I've written a couple papers about this&nbsp;
topic but the problem essentially is that&nbsp;&nbsp;

00:01:55.440 --> 00:02:03.210
if these causal affirmative indicators&nbsp;
are not valid measures of this latent&nbsp;&nbsp;

00:02:03.210 --> 00:02:09.990
variable - but these indicators are&nbsp;
- then these weights or regression&nbsp;&nbsp;

00:02:09.990 --> 00:02:15.570
coefficients here will simply be estimated&nbsp;
as 0. So we have a normal latent variable&nbsp;&nbsp;

00:02:15.570 --> 00:02:20.730
measured with two indicators and then we&nbsp;
have three unrelated indicators that don't&nbsp;&nbsp;

00:02:20.730 --> 00:02:26.250
really have any relationship within latent&nbsp;
variable defined by these two variables here.

00:02:26.250 --> 00:02:32.340
So that's one problem. Another way of&nbsp;
thinking about this is that if we have&nbsp;&nbsp;

00:02:32.340 --> 00:02:39.030
these two indicators here that measure the&nbsp;
latent variable then these three indicators&nbsp;&nbsp;

00:02:39.030 --> 00:02:45.660
here at the bottom are - you don't need them.&nbsp;
So you can just define the model and measure&nbsp;&nbsp;

00:02:45.660 --> 00:02:49.410
it normally with these two indicators&nbsp;
and there are no problems with that.

00:02:49.410 --> 00:02:57.690
And that of course doesn't go well with the idea&nbsp;
that some concepts must be measured with these&nbsp;&nbsp;

00:02:57.690 --> 00:03:05.460
formative indicators. So that's one problem and&nbsp;
what's the cause of this phenomenon - that the&nbsp;&nbsp;

00:03:05.460 --> 00:03:12.180
meaning of this latent variable comes from these&nbsp;
two measures instead of these three measures - is&nbsp;&nbsp;

00:03:12.180 --> 00:03:19.620
that we have this error term here and the&nbsp;
error term guarantees that whatever these&nbsp;&nbsp;

00:03:19.620 --> 00:03:26.940
indicators represent then this error term will&nbsp;
make - because it's unrelated with these three&nbsp;&nbsp;

00:03:26.940 --> 00:03:32.580
indicators here - it makes the latent variable&nbsp;
to be a common factor of these two indicators.

00:03:32.580 --> 00:03:38.730
So if these three indicators are&nbsp;
conceptually unrelated to whatever&nbsp;&nbsp;

00:03:38.730 --> 00:03:44.460
these two indicators represent then the&nbsp;
error term here will compensate for that&nbsp;&nbsp;

00:03:44.460 --> 00:03:49.860
and we are basically just modeling the&nbsp;
error term with these three indicators&nbsp;&nbsp;

00:03:50.460 --> 00:03:54.780
instead of whatever we think that&nbsp;
these causal indicators here cause.

00:03:54.780 --> 00:04:02.280
So that's one problem and how we deal with&nbsp;
that problem? We can of course eliminate&nbsp;&nbsp;

00:04:02.280 --> 00:04:06.660
that problem by eliminating the error term&nbsp;
from the model. But that gives us a - leads&nbsp;&nbsp;

00:04:06.660 --> 00:04:11.280
to another problem. So let's consider this&nbsp;
kind of model. So here this is not a latent&nbsp;&nbsp;

00:04:11.280 --> 00:04:17.730
variable anymore because this formative latent&nbsp;
variable is actually just weighted sum of these&nbsp;&nbsp;

00:04:17.730 --> 00:04:21.870
indicators. There's no error and this is like&nbsp;
a regression analysis without an error term.

00:04:21.870 --> 00:04:27.150
Then how do we set these different&nbsp;
weights? So we create an index based&nbsp;&nbsp;

00:04:27.150 --> 00:04:31.980
on three different indicators. We&nbsp;
set these weights. The normal way&nbsp;&nbsp;

00:04:31.980 --> 00:04:38.100
of defining this use or specifying this&nbsp;
kind of model is that we have this latent&nbsp;&nbsp;

00:04:38.100 --> 00:04:43.170
variable here with other error term and then&nbsp;
we have another latent variable that we want&nbsp;&nbsp;

00:04:43.170 --> 00:04:47.700
to explain with this latent variable&nbsp;
and we have a regression relationship.

00:04:47.700 --> 00:04:55.110
Specifying a model like that defines these&nbsp;
weights so that they maximize this path.&nbsp;&nbsp;

00:04:55.110 --> 00:05:03.000
And it's that problematic or not? Well it&nbsp;
is problematic because if we want to test&nbsp;&nbsp;

00:05:03.000 --> 00:05:10.110
for example whether this beta here is zero or&nbsp;
not whether the beta has an effect whether this&nbsp;&nbsp;

00:05:10.110 --> 00:05:16.350
formative LV has an effect on this other latent&nbsp;
variable then setting these weights so that the&nbsp;&nbsp;

00:05:16.350 --> 00:05:21.570
beta is as large as possible it's probably the&nbsp;
worst possible way that you can create an index.

00:05:21.570 --> 00:05:28.080
So if you want to test if something exists&nbsp;
then trying to argue any correlations in&nbsp;&nbsp;

00:05:28.080 --> 00:05:33.060
your data to make your estimate as large as&nbsp;
possible it's not a good estimation principle.

00:05:33.060 --> 00:05:41.010
So there's possible positive bias. There&nbsp;
is also another problem is that if we&nbsp;&nbsp;

00:05:41.010 --> 00:05:46.740
set these weights so that this beta is&nbsp;
as large as possible then the weights&nbsp;&nbsp;

00:05:46.740 --> 00:05:54.810
actually depend on whatever this other&nbsp;
latent variable is and this leads to a&nbsp;&nbsp;

00:05:54.810 --> 00:05:58.230
problem called interpretational&nbsp;
confounding in this literature.

00:05:58.230 --> 00:06:03.180
So the meaning of this latent variable&nbsp;
here - that is supposed to be caused&nbsp;&nbsp;

00:06:03.180 --> 00:06:08.370
by these three formative indicators -&nbsp;
actually depends on what's the other&nbsp;&nbsp;

00:06:08.370 --> 00:06:13.110
latent variable with other variables we&nbsp;
have in the model. And that's undesirable.

00:06:13.110 --> 00:06:20.520
So if you think about the stock index.&nbsp;
Would it make sense that the stock index&nbsp;&nbsp;

00:06:20.520 --> 00:06:25.260
would be different depending on who is&nbsp;
using the index? I don't think so. It&nbsp;&nbsp;

00:06:25.260 --> 00:06:30.780
should be the same. So the meaning of the&nbsp;
index should be same across studies which&nbsp;&nbsp;

00:06:30.780 --> 00:06:35.670
means that these indicators - these&nbsp;
weights - also must stay the same.

00:06:35.670 --> 00:06:42.580
Then there's also the assumption that if these&nbsp;
indicators here have any effect on this other&nbsp;&nbsp;

00:06:42.580 --> 00:06:48.970
latent variable - then they must be fully&nbsp;
mediated by this formative latent variable.

00:06:48.970 --> 00:06:54.520
So let's consider socioeconomic&nbsp;
status. So that's our formative&nbsp;&nbsp;

00:06:54.520 --> 00:07:02.440
latent variable. One of the indicators&nbsp;
measure is your education and then we&nbsp;&nbsp;

00:07:02.440 --> 00:07:08.560
want to explain child's education&nbsp;
with parents socioeconomic status.

00:07:08.560 --> 00:07:15.250
Is it reasonable to assume that the&nbsp;
parents education has no other causal&nbsp;&nbsp;

00:07:15.250 --> 00:07:21.070
effect on child's education than through the&nbsp;
full mediation through social economic status?&nbsp;&nbsp;

00:07:21.070 --> 00:07:27.460
That is clearly unreasonable. So that full&nbsp;
mediation assumption here is also unreasonable.

00:07:27.460 --> 00:07:37.030
So what's the alternative? The solution is to&nbsp;
define these weights based on theories. So you&nbsp;&nbsp;

00:07:37.030 --> 00:07:43.180
set the weights based on your understanding of&nbsp;
the phenomenon instead of trying to estimate&nbsp;&nbsp;

00:07:43.180 --> 00:07:50.110
the weights empirically and that leads to&nbsp;
index construction. So instead of doing&nbsp;&nbsp;

00:07:50.110 --> 00:07:56.680
this complicated latent variable model that&nbsp;
possibly has an error term - we just take the&nbsp;&nbsp;

00:07:56.680 --> 00:08:03.310
indicators and we take a mean or we take&nbsp;
a sum or we take a weighted sum and we do&nbsp;&nbsp;

00:08:03.310 --> 00:08:09.100
that before our estimation and we define&nbsp;
the weights for the index construction&nbsp;&nbsp;

00:08:09.100 --> 00:08:12.370
based on existing understanding of&nbsp;
the phenomenon and all the theory,

00:08:12.370 --> 00:08:17.470
And I have another video of how&nbsp;
you can actually do that and how&nbsp;&nbsp;

00:08:17.470 --> 00:08:23.320
you justify index construction. So&nbsp;
that's clearly a good approach. A lot&nbsp;&nbsp;

00:08:23.320 --> 00:08:28.180
better approach than trying to specify&nbsp;
these formative latent variable models.