WEBVTT
00:00:00.150 --> 00:00:05.280
The simultaneous equation approach is
another way of calculating a mediation model.
00:00:05.280 --> 00:00:11.040
How this approach works is that we take
the mediation model as one large model.
00:00:11.040 --> 00:00:14.610
And instead of estimating the
regressions of y and m separately,
00:00:14.610 --> 00:00:18.390
we derive the model implied covariance matrix.
00:00:18.390 --> 00:00:21.690
I will be using the correlation matrix here
00:00:21.690 --> 00:00:25.770
for simplification but in practice we
work with covariances nearly always.
00:00:25.770 --> 00:00:30.420
So we look at for example the
correlation between x and y.
00:00:30.420 --> 00:00:34.470
We find that we can go from x
to y using two different paths.
00:00:34.470 --> 00:00:37.740
We go from x through m to y.
00:00:37.740 --> 00:00:41.160
And we go from x to y directly.
00:00:41.160 --> 00:00:44.430
So that gives us two ways or two elements.
00:00:44.430 --> 00:00:51.600
We have the mediation effect beta m1 x
beta y 2 plus the direct effect beta y 1.
00:00:51.600 --> 00:00:55.980
Similarly we can calculate the
correlation between Y and M.
00:00:55.980 --> 00:01:01.320
It is the direct path and it is this spurious
00:01:01.320 --> 00:01:04.740
correlation due to X that
is a common cause for both.
00:01:04.740 --> 00:01:10.890
And that gives us the correlation between M and Y.
00:01:10.890 --> 00:01:18.420
How we estimate this model is that we calculate,
00:01:18.420 --> 00:01:22.650
we find the betas.
So that the data correlation
00:01:22.650 --> 00:01:28.020
matrix and model implied correlation matrix
are as close as one another as possible.
00:01:28.020 --> 00:01:31.860
To understand the calculation we have to
first take a look at the degrees of freedom,
00:01:31.860 --> 00:01:35.220
because that's important
for this particular problem.
00:01:35.220 --> 00:01:41.640
The degrees on freedom for this model is
calculated based on these correlations.
00:01:41.640 --> 00:01:44.250
Because we only used information from the
00:01:44.250 --> 00:01:47.400
correlations we don't look at
the individual observations.
00:01:47.400 --> 00:01:54.330
Units of information the data, we have 5
correlations that depend on the model parameters.
00:01:54.330 --> 00:01:58.980
Importantly the variance of
X doesn't count because that
00:01:58.980 --> 00:02:00.810
doesn't depend on any of the model parameters.
00:02:00.810 --> 00:02:04.230
So we have these five elements,
variance of M,
00:02:04.230 --> 00:02:08.340
variance of Y,
and all the correlations that depend on the model.
00:02:08.340 --> 00:02:10.680
So we have five units of data.
00:02:10.680 --> 00:02:14.520
Then we have five things that we estimate.
00:02:14.520 --> 00:02:15.870
We have five free parameters.
00:02:15.870 --> 00:02:19.050
We have three regression coefficients here.
00:02:19.050 --> 00:02:20.910
And then we have these two variance,
00:02:20.910 --> 00:02:23.370
the variance of this error term
and variance of that error term.
00:02:23.370 --> 00:02:25.320
So we estimate five different things.
00:02:25.320 --> 00:02:31.080
And the degrees of freedom for this model is then
zero because it's a difference between these two.
00:02:31.080 --> 00:02:34.590
And we say that this is just identified model.
00:02:34.590 --> 00:02:39.510
The just identified means that we
can estimate the model but we are
00:02:39.510 --> 00:02:42.750
using all the information from
the data to estimate the model.
00:02:42.750 --> 00:02:45.300
And we could not add anything more to the model.
00:02:45.300 --> 00:02:49.140
It also means that the model
will fit perfectly and what
00:02:49.140 --> 00:02:51.900
that means I will explain bit later in the video.
00:02:51.900 --> 00:02:54.570
So we have a just identified model.
00:02:54.570 --> 00:02:59.850
It means that we can find the values
of these variances and the betas,
00:02:59.850 --> 00:03:07.230
so that the model implied correlation matrix
matches exactly the data correlation matrix.
00:03:07.230 --> 00:03:11.430
We can do that for example
using the lavaan package in R.
00:03:11.430 --> 00:03:17.760
So lavaan gives us output,
you can do the same with the SEM command in Stata,
00:03:17.760 --> 00:03:22.440
and the output it contains
importantly two different sections.
00:03:22.440 --> 00:03:25.860
So we have estimation information.
00:03:25.860 --> 00:03:30.900
And this is not particularly useful
because our degree of freedom is zero,
00:03:30.900 --> 00:03:32.580
we can't do moral testing.
00:03:32.580 --> 00:03:35.490
If we have positive degree of
freedom we could test the model
00:03:35.490 --> 00:03:37.770
and I'll talk more about that later in the video.
00:03:37.770 --> 00:03:40.980
And then we have these coefficients.
00:03:40.980 --> 00:03:42.540
So we have regressions,
00:03:42.540 --> 00:03:49.500
we have regressions of Y on M and
X so that's beta y 1 and beta y 2.
00:03:49.500 --> 00:03:51.990
Then we have regression of M on X,
00:03:51.990 --> 00:04:03.090
that's beta M1 and then we have these
estimated error variances yu and yn, that way.
00:04:03.090 --> 00:04:08.490
So we get the estimates here,
we get the standard errors here,
00:04:12.270 --> 00:04:16.920
these are Z values they are not T values
because this is based on large sample theory.
00:04:16.920 --> 00:04:19.950
And then we get p values for these estimates.
00:04:19.950 --> 00:04:25.110
Then we can also calculate using
this package the mediation effect.
00:04:25.110 --> 00:04:30.510
So we define that into the models that's something
that the software calculate for automatically.
00:04:30.510 --> 00:04:34.740
The standard error the Z value and
the p value for the standard error,
00:04:34.740 --> 00:04:38.220
and then we have the total effect which is
00:04:38.220 --> 00:04:43.470
the effect of X on Y that goes
directly, and X on M through M.
00:04:43.470 --> 00:04:44.640
So that's a total effect.
00:04:44.640 --> 00:04:51.270
Total effect is influence of x on y regardless
of whether it goes directly or through m.
00:04:51.270 --> 00:04:54.150
And then the direct effect is just beta y1.
00:04:54.150 --> 00:04:57.210
So that gives us the estimates.
00:04:57.210 --> 00:05:04.530
And how it actually then works if we
want to test a partial mediation model.
00:05:04.530 --> 00:05:09.210
So importantly these estimates will be the exact
same that you get from regression analysis.
00:05:09.210 --> 00:05:11.490
If you estimate this model separately using
00:05:11.490 --> 00:05:15.390
regressions then you will
get the exact same results.
00:05:15.390 --> 00:05:20.670
There will be differences once we start to
estimate models that are over identified.
00:05:20.670 --> 00:05:24.930
For example if we estimate
directly a full mediation model,
00:05:24.930 --> 00:05:27.990
so we're saying that there is no path from x to y.
00:05:27.990 --> 00:05:36.030
Estimate the model where we assume that
all effects of x to y go through m.
00:05:36.030 --> 00:05:42.600
And we apply tracing rules again, we can see
that the equations are a bit simpler here.
00:05:42.600 --> 00:05:49.200
Because we only go from X to Y using
this one path beta m1, beta y2.
00:05:49.200 --> 00:05:53.370
So there's no direct path anymore from X to Y.
00:05:53.370 --> 00:05:55.260
So it's only this product.
00:05:55.260 --> 00:05:59.670
And this has a positive degree of freedom.
00:05:59.670 --> 00:06:03.210
So the data are the same so we have 5 units of
00:06:03.210 --> 00:06:06.390
data but we now only have 4
parameters that we estimate.
00:06:06.390 --> 00:06:10.200
So we have two regression
coefficients and two error variances.
00:06:10.200 --> 00:06:12.750
Then the degrees of freedom is the difference.
00:06:12.750 --> 00:06:17.400
So we have one degree of freedom and
we call this over identified model.
00:06:17.400 --> 00:06:23.550
The problem or a feature whether
you want to call it that,
00:06:23.550 --> 00:06:29.730
is of these over identified model,
is that generally we cannot make
00:06:29.730 --> 00:06:34.470
the model implied correlation matrix to
exactly equal the data correlation matrix.
00:06:34.470 --> 00:06:41.310
Instead of making those the same and
solving, we have to make the model
00:06:41.310 --> 00:06:46.110
implied correlation matrix as close as
possible to the data correlation matrix.
00:06:46.110 --> 00:06:54.600
So to make that model implied correlation matrix
as close as possible to data correlation matrix,
00:06:54.600 --> 00:06:56.820
we have to define what we mean by close.
00:06:56.820 --> 00:07:01.050
So we have to define how we
quantify the distance between this,
00:07:01.050 --> 00:07:05.340
how different the model implied correlation
matrix is from the data correlation matrix.
00:07:05.340 --> 00:07:09.300
This problem of quantifying the
difference between these two
00:07:09.300 --> 00:07:13.800
matrixes is comparable to the regression analysis.
00:07:13.800 --> 00:07:18.450
So in regression analysis we
use the discrepancy function.
00:07:18.450 --> 00:07:23.040
So we calculate the difference between a
regression line and the actual observations.
00:07:23.040 --> 00:07:27.480
And to do that we calculate the residual,
so the difference between a
00:07:27.480 --> 00:07:31.980
line and the observations,
we take squares of residuals.
00:07:31.980 --> 00:07:39.540
The idea of taking squares is that we
want to avoid having large estimation,
00:07:39.540 --> 00:07:41.430
large prediction errors.
00:07:41.430 --> 00:07:45.780
So we are ok with small prediction errors but
we want to avoid having large prediction errors.
00:07:45.780 --> 00:07:51.600
Then we take a sum of these squares and that
gives us the ordinary least squares estimator.
00:07:51.600 --> 00:07:54.540
We minimize that, gives us
the regression correlations.
00:07:54.540 --> 00:08:00.450
In path analysis, we calculate the
difference between each unique cell,
00:08:00.450 --> 00:08:02.940
in the observed correlation or covariance matrix,
00:08:02.940 --> 00:08:05.970
and the model implied
correlation or covariance matrix.
00:08:05.970 --> 00:08:08.910
We raise those differences to the second power.
00:08:08.910 --> 00:08:13.590
The idea again is that we
want to avoid having models
00:08:13.590 --> 00:08:16.680
that explain some parts of the data really badly.
00:08:16.680 --> 00:08:23.220
And we are kind of ok with models that
are slightly off compared to the data.
00:08:23.220 --> 00:08:27.690
Then we sum these differences and these squared,
00:08:27.690 --> 00:08:30.990
differences and that provides the
unweighted least squares estimator.
00:08:30.990 --> 00:08:39.510
There's another parallel between path
analysis and regression analysis.
00:08:39.510 --> 00:08:42.810
So besides minimizing the discrepancy function
00:08:42.810 --> 00:08:46.110
and that gives us estimates
that are in some way ideal.
00:08:46.110 --> 00:08:53.940
Then the discrepancy can be used to
quantify the goodness of fit of the model.
00:08:54.480 --> 00:08:56.640
So the r-square one definition of our
00:08:56.640 --> 00:09:00.810
square regression analysis is
based on this sum of squares.
00:09:00.810 --> 00:09:04.950
So we calculate the sum of squares regression,
and then we compare that to the
00:09:04.950 --> 00:09:06.870
total sum of squares,
and that gives us R square.
00:09:06.870 --> 00:09:14.460
Then here we have the sum of squares
of these covariance errors and that
00:09:14.460 --> 00:09:17.220
can be used to quantify the model fit as well.
00:09:17.220 --> 00:09:19.110
Let's take a look.
00:09:19.110 --> 00:09:23.640
So i estimated information, there's
estimation information again,
00:09:23.640 --> 00:09:27.900
we have one degree of freedom
for this full mediation model.
00:09:27.900 --> 00:09:31.110
And we have a p-value that is significant,
00:09:31.110 --> 00:09:35.610
that is nonsignificant,
I'll go through the p-value shortly.
00:09:35.610 --> 00:09:38.820
So the idea and then we have the estimates here.
00:09:38.820 --> 00:09:48.270
The idea of the p-value is that it quantifies,
how much different there actual observed
00:09:48.270 --> 00:09:50.700
correlation matrix is from the
implied correlation matrix.
00:09:50.700 --> 00:09:55.020
So the difference between this
observed correlation matrix,
00:09:55.020 --> 00:09:59.490
and this model implied correlation matrix
is called the residual correlation matrix.
00:09:59.490 --> 00:10:03.496
So it's again,
there's a parallel to regression analysis,
00:10:03.496 --> 00:10:08.100
where we have residuals,
when we work with raw observations,
00:10:08.100 --> 00:10:10.830
like in regression analysis,
the residual is the difference
00:10:10.830 --> 00:10:13.200
between actual observations
and predictive value.
00:10:13.200 --> 00:10:15.660
Here where we work with correlations the residual
00:10:15.660 --> 00:10:19.560
is the difference between a predicted
correlation and observed correlation.
00:10:19.560 --> 00:10:23.730
So this residual correlation
matrix here is basically the
00:10:23.730 --> 00:10:26.250
observed correlations - the implied correlations.
00:10:26.250 --> 00:10:30.060
You can verify that it actually is the case.
00:10:30.060 --> 00:10:36.900
So the question that the p-value here answers,
00:10:36.900 --> 00:10:41.760
whether this small correlation
here can be by chance only.
00:10:41.760 --> 00:10:47.100
So is it possible that the sampling error
00:10:47.100 --> 00:10:51.120
in the observed correlation matrix
produces that kind of discrepancy.
00:10:51.120 --> 00:10:55.500
that's close to zero so we can say
that that's probably due to chance,
00:10:55.500 --> 00:10:59.580
but if it was far from zero
then we would know that this
00:10:59.580 --> 00:11:04.470
model doesn't adequately explain
the correlation between x and y.
00:11:04.470 --> 00:11:11.190
And we would probably conclude that
x has also a direct effect of y.
00:11:11.190 --> 00:11:16.470
So it would be a partial mediation instead of
a full mediation model that is specified here.
00:11:16.470 --> 00:11:19.200
So that's the a test here.
00:11:19.200 --> 00:11:28.590
The p-value of about 0.7 indicates that getting
this kind of effect by chance only is plausible.
00:11:28.590 --> 00:11:30.570
So normally,
00:11:30.570 --> 00:11:35.520
and this is called an over identification test,
because we have one degree of freedom,
00:11:35.520 --> 00:11:40.740
we are testing whether that one degree of freedom
is consistent with what we have in the model.
00:11:40.740 --> 00:11:48.570
And we want to accept the chi-square test
we want to accept the null hypothesis here.
00:11:48.570 --> 00:11:51.960
The reason is that normally
in the regression analysis,
00:11:51.960 --> 00:11:56.520
we are interested in showing
that the null hypothesis,
00:11:56.520 --> 00:11:59.910
that coefficient is zero is not supported,
00:11:59.910 --> 00:12:02.040
because we usually want to
say that there is an effect.
00:12:02.040 --> 00:12:05.460
Now we want to say that there
is no difference with the model
00:12:05.460 --> 00:12:06.990
implied matrix and the actual matrix.
00:12:06.990 --> 00:12:14.220
So we are saying that the model
implied matrix fits well to the data,
00:12:14.220 --> 00:12:19.380
and therefore we can conclude that the model
implied matrix is in some sense correct.
00:12:19.380 --> 00:12:21.090
And the model is in some sense correct.
00:12:21.090 --> 00:12:23.760
So we want to accept the null hypothesis.
00:12:23.760 --> 00:12:28.860
If we reject the null hypothesis,
then we conclude that this model
00:12:28.860 --> 00:12:33.630
is inadequate for the data,
and we shouldn't make much
00:12:33.630 --> 00:12:38.370
inferences based on the model estimates,
instead we should be looking at why the
00:12:38.370 --> 00:12:42.390
model doesn't explain the data well,
and perhaps adjust the model.
00:12:42.390 --> 00:12:45.360
For example add the direct path from X to Y.
00:12:45.360 --> 00:12:51.180
Now this is, here we have just
one statistic so we could be
00:12:51.180 --> 00:12:56.070
just comparing this statistic against and
appropriately chosen normal distribution.
00:12:56.070 --> 00:13:00.480
We don't do that,
instead we use the chi-square test,
00:13:00.480 --> 00:13:04.530
the reason is that for more complicated models,
or more complex models,
00:13:04.530 --> 00:13:09.660
there are typically more than one elements of
these residual correlation matrix that is nonzero.
00:13:09.660 --> 00:13:16.170
So when we ask the question of can this
small difference be by chance only,
00:13:16.170 --> 00:13:20.490
we can take a look at the normal distribution
and how far from zero that estimate is.
00:13:20.490 --> 00:13:21.980
And that gives us the p-value.
00:13:21.980 --> 00:13:27.110
If we have to that use the
Z value estimate divided by
00:13:27.110 --> 00:13:30.080
standard deviation or estimate
divided by the standard error,
00:13:30.080 --> 00:13:34.550
if we have two cells here
that are different from zero,
00:13:34.550 --> 00:13:41.630
then we have to do a test that these
both are zero at the same time.
00:13:41.630 --> 00:13:46.970
So we are looking at the plane so
instead of looking at one variable
00:13:46.970 --> 00:13:51.230
we look at two variables and
how far they are from zero.
00:13:51.230 --> 00:13:57.470
And you may remember from our earlier video,
that in this case, or from your
00:13:57.470 --> 00:14:00.950
math class in high school,
this distance is calculated;
00:14:00.950 --> 00:14:06.890
by taking a square of this coordinate,
and a square of this coordinate,
00:14:06.890 --> 00:14:09.710
taking a sum and then hanging a square root.
00:14:09.710 --> 00:14:12.980
In practice we don't take the square root,
00:14:12.980 --> 00:14:18.650
because we can just use reference distribution
that takes the square root into account.
00:14:18.650 --> 00:14:25.250
So we have the square of this estimate,
and square of this estimate.
00:14:25.250 --> 00:14:31.790
We take a sum and that gives us
the chi-square of this statistic.
00:14:31.790 --> 00:14:37.040
So the chi-square is the sum of two are
normally distributed random variables,
00:14:37.040 --> 00:14:40.880
when both have,
sum of squares of two
00:14:40.880 --> 00:14:44.510
normally distributed variables,
when both have mean of zero.
00:14:44.510 --> 00:14:48.110
So the null hypothesis is
that both of these are 0.
00:14:48.110 --> 00:14:54.350
Then the distribution is chi-square,
so we take one random variable,
00:14:54.350 --> 00:14:57.950
normally distributed center at zero,
we square that,
00:14:57.950 --> 00:14:59.630
we take another one,
we square that,
00:14:59.630 --> 00:15:02.540
we take a sum,
and that gives us the reference distribution.
00:15:02.540 --> 00:15:08.750
So it's, basically there's a parallel again,
minimize the sum of squared residuals.
00:15:08.750 --> 00:15:17.390
Well we want to minimize the sum of squares
of these differences and we quantify these,
00:15:17.390 --> 00:15:20.450
the difference by looking at
the actual sum of squares.
00:15:20.450 --> 00:15:25.610
So we take squares of these
estimate and standard error,
00:15:25.610 --> 00:15:28.970
which gives the variance and that
gives us the chi-square statistic.
00:15:28.970 --> 00:15:34.640
So the logic is that the instead of comparing
just one statistic against a normal distribution,
00:15:34.640 --> 00:15:40.970
we compare the sum of squares estimate of the,
sum of squares of two differences,
00:15:40.970 --> 00:15:45.290
against the sum of squares of two
normally distributed variables.
00:15:45.290 --> 00:15:50.780
If our it's plausible that a random
process of two normally distributed
00:15:50.780 --> 00:15:56.630
variables would have produced the same distance,
then we conclude that could be by chance only.