WEBVTT 00:00:00.150 --> 00:00:05.280 The simultaneous equation approach is  another way of calculating a mediation model. 00:00:05.280 --> 00:00:11.040 How this approach works is that we take  the mediation model as one large model. 00:00:11.040 --> 00:00:14.610 And instead of estimating the  regressions of y and m separately,  00:00:14.610 --> 00:00:18.390 we derive the model implied covariance matrix. 00:00:18.390 --> 00:00:21.690 I will be using the correlation matrix here   00:00:21.690 --> 00:00:25.770 for simplification but in practice we  work with covariances nearly always. 00:00:25.770 --> 00:00:30.420 So we look at for example the  correlation between x and y. 00:00:30.420 --> 00:00:34.470 We find that we can go from x  to y using two different paths. 00:00:34.470 --> 00:00:37.740 We go from x through m to y. 00:00:37.740 --> 00:00:41.160 And we go from x to y directly. 00:00:41.160 --> 00:00:44.430 So that gives us two ways or two elements. 00:00:44.430 --> 00:00:51.600 We have the mediation effect beta m1 x  beta y 2 plus the direct effect beta y 1. 00:00:51.600 --> 00:00:55.980 Similarly we can calculate the  correlation between Y and M. 00:00:55.980 --> 00:01:01.320 It is the direct path and it is this spurious   00:01:01.320 --> 00:01:04.740 correlation due to X that  is a common cause for both. 00:01:04.740 --> 00:01:10.890 And that gives us the correlation between M and Y. 00:01:10.890 --> 00:01:18.420 How we estimate this model is that we calculate,  00:01:18.420 --> 00:01:22.650 we find the betas. So that the data correlation   00:01:22.650 --> 00:01:28.020 matrix and model implied correlation matrix  are as close as one another as possible. 00:01:28.020 --> 00:01:31.860 To understand the calculation we have to  first take a look at the degrees of freedom,  00:01:31.860 --> 00:01:35.220 because that's important  for this particular problem. 00:01:35.220 --> 00:01:41.640 The degrees on freedom for this model is  calculated based on these correlations. 00:01:41.640 --> 00:01:44.250 Because we only used information from the   00:01:44.250 --> 00:01:47.400 correlations we don't look at  the individual observations. 00:01:47.400 --> 00:01:54.330 Units of information the data, we have 5  correlations that depend on the model parameters. 00:01:54.330 --> 00:01:58.980 Importantly the variance of  X doesn't count because that   00:01:58.980 --> 00:02:00.810 doesn't depend on any of the model parameters. 00:02:00.810 --> 00:02:04.230 So we have these five elements, variance of M,  00:02:04.230 --> 00:02:08.340 variance of Y, and all the correlations that depend on the model. 00:02:08.340 --> 00:02:10.680 So we have five units of data. 00:02:10.680 --> 00:02:14.520 Then we have five things that we estimate. 00:02:14.520 --> 00:02:15.870 We have five free parameters. 00:02:15.870 --> 00:02:19.050 We have three regression coefficients here. 00:02:19.050 --> 00:02:20.910 And then we have these two variance,  00:02:20.910 --> 00:02:23.370 the variance of this error term and variance of that error term. 00:02:23.370 --> 00:02:25.320 So we estimate five different things. 00:02:25.320 --> 00:02:31.080 And the degrees of freedom for this model is then  zero because it's a difference between these two. 00:02:31.080 --> 00:02:34.590 And we say that this is just identified model. 00:02:34.590 --> 00:02:39.510 The just identified means that we  can estimate the model but we are   00:02:39.510 --> 00:02:42.750 using all the information from  the data to estimate the model. 00:02:42.750 --> 00:02:45.300 And we could not add anything more to the model. 00:02:45.300 --> 00:02:49.140 It also means that the model  will fit perfectly and what   00:02:49.140 --> 00:02:51.900 that means I will explain bit later in the video. 00:02:51.900 --> 00:02:54.570 So we have a just identified model.  00:02:54.570 --> 00:02:59.850 It means that we can find the values  of these variances and the betas,  00:02:59.850 --> 00:03:07.230 so that the model implied correlation matrix  matches exactly the data correlation matrix. 00:03:07.230 --> 00:03:11.430 We can do that for example  using the lavaan package in R. 00:03:11.430 --> 00:03:17.760 So lavaan gives us output, you can do the same with the SEM command in Stata,  00:03:17.760 --> 00:03:22.440 and the output it contains  importantly two different sections. 00:03:22.440 --> 00:03:25.860 So we have estimation information. 00:03:25.860 --> 00:03:30.900 And this is not particularly useful  because our degree of freedom is zero,   00:03:30.900 --> 00:03:32.580 we can't do moral testing. 00:03:32.580 --> 00:03:35.490 If we have positive degree of  freedom we could test the model   00:03:35.490 --> 00:03:37.770 and I'll talk more about that later in the video. 00:03:37.770 --> 00:03:40.980 And then we have these coefficients. 00:03:40.980 --> 00:03:42.540 So we have regressions,  00:03:42.540 --> 00:03:49.500 we have regressions of Y on M and  X so that's beta y 1 and beta y 2. 00:03:49.500 --> 00:03:51.990 Then we have regression of M on X,  00:03:51.990 --> 00:04:03.090 that's beta M1 and then we have these  estimated error variances yu and yn, that way. 00:04:03.090 --> 00:04:08.490 So we get the estimates here, we get the standard errors here,  00:04:12.270 --> 00:04:16.920 these are Z values they are not T values  because this is based on large sample theory. 00:04:16.920 --> 00:04:19.950 And then we get p values for these estimates. 00:04:19.950 --> 00:04:25.110 Then we can also calculate using  this package the mediation effect. 00:04:25.110 --> 00:04:30.510 So we define that into the models that's something  that the software calculate for automatically. 00:04:30.510 --> 00:04:34.740 The standard error the Z value and  the p value for the standard error, 00:04:34.740 --> 00:04:38.220 and then we have the total effect which is   00:04:38.220 --> 00:04:43.470 the effect of X on Y that goes  directly, and X on M through M. 00:04:43.470 --> 00:04:44.640 So that's a total effect. 00:04:44.640 --> 00:04:51.270 Total effect is influence of x on y regardless  of whether it goes directly or through m. 00:04:51.270 --> 00:04:54.150 And then the direct effect is just beta y1. 00:04:54.150 --> 00:04:57.210 So that gives us the estimates. 00:04:57.210 --> 00:05:04.530 And how it actually then works if we  want to test a partial mediation model. 00:05:04.530 --> 00:05:09.210 So importantly these estimates will be the exact  same that you get from regression analysis. 00:05:09.210 --> 00:05:11.490 If you estimate this model separately using   00:05:11.490 --> 00:05:15.390 regressions then you will  get the exact same results. 00:05:15.390 --> 00:05:20.670 There will be differences once we start to  estimate models that are over identified. 00:05:20.670 --> 00:05:24.930 For example if we estimate  directly a full mediation model,  00:05:24.930 --> 00:05:27.990 so we're saying that there is no path from x to y. 00:05:27.990 --> 00:05:36.030 Estimate the model where we assume that  all effects of x to y go through m. 00:05:36.030 --> 00:05:42.600 And we apply tracing rules again, we can see  that the equations are a bit simpler here. 00:05:42.600 --> 00:05:49.200 Because we only go from X to Y using  this one path beta m1, beta y2. 00:05:49.200 --> 00:05:53.370 So there's no direct path anymore from X to Y. 00:05:53.370 --> 00:05:55.260 So it's only this product. 00:05:55.260 --> 00:05:59.670 And this has a positive degree of freedom. 00:05:59.670 --> 00:06:03.210 So the data are the same so we have 5 units of   00:06:03.210 --> 00:06:06.390 data but we now only have 4  parameters that we estimate. 00:06:06.390 --> 00:06:10.200 So we have two regression  coefficients and two error variances. 00:06:10.200 --> 00:06:12.750 Then the degrees of freedom is the difference. 00:06:12.750 --> 00:06:17.400 So we have one degree of freedom and  we call this over identified model. 00:06:17.400 --> 00:06:23.550 The problem or a feature whether  you want to call it that,  00:06:23.550 --> 00:06:29.730 is of these over identified model, is that generally we cannot make   00:06:29.730 --> 00:06:34.470 the model implied correlation matrix to  exactly equal the data correlation matrix. 00:06:34.470 --> 00:06:41.310 Instead of making those the same and  solving, we have to make the model   00:06:41.310 --> 00:06:46.110 implied correlation matrix as close as  possible to the data correlation matrix. 00:06:46.110 --> 00:06:54.600 So to make that model implied correlation matrix  as close as possible to data correlation matrix, 00:06:54.600 --> 00:06:56.820 we have to define what we mean by close. 00:06:56.820 --> 00:07:01.050 So we have to define how we  quantify the distance between this, 00:07:01.050 --> 00:07:05.340 how different the model implied correlation  matrix is from the data correlation matrix. 00:07:05.340 --> 00:07:09.300 This problem of quantifying the  difference between these two   00:07:09.300 --> 00:07:13.800 matrixes is comparable to the regression analysis. 00:07:13.800 --> 00:07:18.450 So in regression analysis we  use the discrepancy function. 00:07:18.450 --> 00:07:23.040 So we calculate the difference between a  regression line and the actual observations. 00:07:23.040 --> 00:07:27.480 And to do that we calculate the residual, so the difference between a   00:07:27.480 --> 00:07:31.980 line and the observations, we take squares of residuals. 00:07:31.980 --> 00:07:39.540 The idea of taking squares is that we  want to avoid having large estimation,   00:07:39.540 --> 00:07:41.430 large prediction errors. 00:07:41.430 --> 00:07:45.780 So we are ok with small prediction errors but  we want to avoid having large prediction errors. 00:07:45.780 --> 00:07:51.600 Then we take a sum of these squares and that  gives us the ordinary least squares estimator. 00:07:51.600 --> 00:07:54.540 We minimize that, gives us  the regression correlations. 00:07:54.540 --> 00:08:00.450 In path analysis, we calculate the  difference between each unique cell,  00:08:00.450 --> 00:08:02.940 in the observed correlation or covariance matrix,  00:08:02.940 --> 00:08:05.970 and the model implied  correlation or covariance matrix. 00:08:05.970 --> 00:08:08.910 We raise those differences to the second power. 00:08:08.910 --> 00:08:13.590 The idea again is that we  want to avoid having models   00:08:13.590 --> 00:08:16.680 that explain some parts of the data really badly. 00:08:16.680 --> 00:08:23.220 And we are kind of ok with models that  are slightly off compared to the data. 00:08:23.220 --> 00:08:27.690 Then we sum these differences and these squared,   00:08:27.690 --> 00:08:30.990 differences and that provides the  unweighted least squares estimator. 00:08:30.990 --> 00:08:39.510 There's another parallel between path  analysis and regression analysis. 00:08:39.510 --> 00:08:42.810 So besides minimizing the discrepancy function   00:08:42.810 --> 00:08:46.110 and that gives us estimates  that are in some way ideal. 00:08:46.110 --> 00:08:53.940 Then the discrepancy can be used to  quantify the goodness of fit of the model. 00:08:54.480 --> 00:08:56.640 So the r-square one definition of our   00:08:56.640 --> 00:09:00.810 square regression analysis is  based on this sum of squares. 00:09:00.810 --> 00:09:04.950 So we calculate the sum of squares regression, and then we compare that to the   00:09:04.950 --> 00:09:06.870 total sum of squares, and that gives us R square. 00:09:06.870 --> 00:09:14.460 Then here we have the sum of squares  of these covariance errors and that   00:09:14.460 --> 00:09:17.220 can be used to quantify the model fit as well. 00:09:17.220 --> 00:09:19.110 Let's take a look. 00:09:19.110 --> 00:09:23.640 So i estimated information, there's  estimation information again,  00:09:23.640 --> 00:09:27.900 we have one degree of freedom  for this full mediation model. 00:09:27.900 --> 00:09:31.110 And we have a p-value that is significant,  00:09:31.110 --> 00:09:35.610 that is nonsignificant, I'll go through the p-value shortly. 00:09:35.610 --> 00:09:38.820 So the idea and then we have the estimates here. 00:09:38.820 --> 00:09:48.270 The idea of the p-value is that it quantifies, how much different there actual observed   00:09:48.270 --> 00:09:50.700 correlation matrix is from the  implied correlation matrix. 00:09:50.700 --> 00:09:55.020 So the difference between this  observed correlation matrix,  00:09:55.020 --> 00:09:59.490 and this model implied correlation matrix  is called the residual correlation matrix. 00:09:59.490 --> 00:10:03.496 So it's again, there's a parallel to regression analysis,  00:10:03.496 --> 00:10:08.100 where we have residuals, when we work with raw observations,  00:10:08.100 --> 00:10:10.830 like in regression analysis, the residual is the difference   00:10:10.830 --> 00:10:13.200 between actual observations and predictive value. 00:10:13.200 --> 00:10:15.660 Here where we work with correlations the residual   00:10:15.660 --> 00:10:19.560 is the difference between a predicted  correlation and observed correlation. 00:10:19.560 --> 00:10:23.730 So this residual correlation  matrix here is basically the   00:10:23.730 --> 00:10:26.250 observed correlations - the implied correlations. 00:10:26.250 --> 00:10:30.060 You can verify that it actually is the case. 00:10:30.060 --> 00:10:36.900 So the question that the p-value here answers,  00:10:36.900 --> 00:10:41.760 whether this small correlation  here can be by chance only. 00:10:41.760 --> 00:10:47.100 So is it possible that the sampling error   00:10:47.100 --> 00:10:51.120 in the observed correlation matrix  produces that kind of discrepancy. 00:10:51.120 --> 00:10:55.500 that's close to zero so we can say  that that's probably due to chance,  00:10:55.500 --> 00:10:59.580 but if it was far from zero  then we would know that this   00:10:59.580 --> 00:11:04.470 model doesn't adequately explain  the correlation between x and y. 00:11:04.470 --> 00:11:11.190 And we would probably conclude that  x has also a direct effect of y. 00:11:11.190 --> 00:11:16.470 So it would be a partial mediation instead of  a full mediation model that is specified here. 00:11:16.470 --> 00:11:19.200 So that's the a test here. 00:11:19.200 --> 00:11:28.590 The p-value of about 0.7 indicates that getting  this kind of effect by chance only is plausible. 00:11:28.590 --> 00:11:30.570 So normally,  00:11:30.570 --> 00:11:35.520 and this is called an over identification test, because we have one degree of freedom,  00:11:35.520 --> 00:11:40.740 we are testing whether that one degree of freedom  is consistent with what we have in the model. 00:11:40.740 --> 00:11:48.570 And we want to accept the chi-square test  we want to accept the null hypothesis here. 00:11:48.570 --> 00:11:51.960 The reason is that normally  in the regression analysis,  00:11:51.960 --> 00:11:56.520 we are interested in showing  that the null hypothesis,  00:11:56.520 --> 00:11:59.910 that coefficient is zero is not supported,  00:11:59.910 --> 00:12:02.040 because we usually want to  say that there is an effect. 00:12:02.040 --> 00:12:05.460 Now we want to say that there  is no difference with the model   00:12:05.460 --> 00:12:06.990 implied matrix and the actual matrix. 00:12:06.990 --> 00:12:14.220 So we are saying that the model  implied matrix fits well to the data,  00:12:14.220 --> 00:12:19.380 and therefore we can conclude that the model  implied matrix is in some sense correct. 00:12:19.380 --> 00:12:21.090 And the model is in some sense correct. 00:12:21.090 --> 00:12:23.760 So we want to accept the null hypothesis. 00:12:23.760 --> 00:12:28.860 If we reject the null hypothesis, then we conclude that this model   00:12:28.860 --> 00:12:33.630 is inadequate for the data, and we shouldn't make much   00:12:33.630 --> 00:12:38.370 inferences based on the model estimates, instead we should be looking at why the   00:12:38.370 --> 00:12:42.390 model doesn't explain the data well, and perhaps adjust the model. 00:12:42.390 --> 00:12:45.360 For example add the direct path from X to Y. 00:12:45.360 --> 00:12:51.180 Now this is, here we have just  one statistic so we could be   00:12:51.180 --> 00:12:56.070 just comparing this statistic against and  appropriately chosen normal distribution. 00:12:56.070 --> 00:13:00.480 We don't do that, instead we use the chi-square test,  00:13:00.480 --> 00:13:04.530 the reason is that for more complicated models, or more complex models,  00:13:04.530 --> 00:13:09.660 there are typically more than one elements of  these residual correlation matrix that is nonzero. 00:13:09.660 --> 00:13:16.170 So when we ask the question of can this  small difference be by chance only,  00:13:16.170 --> 00:13:20.490 we can take a look at the normal distribution  and how far from zero that estimate is. 00:13:20.490 --> 00:13:21.980 And that gives us the p-value. 00:13:21.980 --> 00:13:27.110 If we have to that use the  Z value estimate divided by   00:13:27.110 --> 00:13:30.080 standard deviation or estimate  divided by the standard error,  00:13:30.080 --> 00:13:34.550 if we have two cells here  that are different from zero, 00:13:34.550 --> 00:13:41.630 then we have to do a test that these  both are zero at the same time. 00:13:41.630 --> 00:13:46.970 So we are looking at the plane so  instead of looking at one variable   00:13:46.970 --> 00:13:51.230 we look at two variables and  how far they are from zero. 00:13:51.230 --> 00:13:57.470 And you may remember from our earlier video, that in this case, or from your   00:13:57.470 --> 00:14:00.950 math class in high school, this distance is calculated; 00:14:00.950 --> 00:14:06.890 by taking a square of this coordinate, and a square of this coordinate,  00:14:06.890 --> 00:14:09.710 taking a sum and then hanging a square root. 00:14:09.710 --> 00:14:12.980 In practice we don't take the square root,  00:14:12.980 --> 00:14:18.650 because we can just use reference distribution  that takes the square root into account. 00:14:18.650 --> 00:14:25.250 So we have the square of this estimate, and square of this estimate. 00:14:25.250 --> 00:14:31.790 We take a sum and that gives us  the chi-square of this statistic. 00:14:31.790 --> 00:14:37.040 So the chi-square is the sum of two are  normally distributed random variables,  00:14:37.040 --> 00:14:40.880 when both have, sum of squares of two   00:14:40.880 --> 00:14:44.510 normally distributed variables, when both have mean of zero. 00:14:44.510 --> 00:14:48.110 So the null hypothesis is  that both of these are 0. 00:14:48.110 --> 00:14:54.350 Then the distribution is chi-square, so we take one random variable,  00:14:54.350 --> 00:14:57.950 normally distributed center at zero, we square that,  00:14:57.950 --> 00:14:59.630 we take another one, we square that,  00:14:59.630 --> 00:15:02.540 we take a sum, and that gives us the reference distribution. 00:15:02.540 --> 00:15:08.750 So it's, basically there's a parallel again, minimize the sum of squared residuals. 00:15:08.750 --> 00:15:17.390 Well we want to minimize the sum of squares  of these differences and we quantify these,  00:15:17.390 --> 00:15:20.450 the difference by looking at  the actual sum of squares. 00:15:20.450 --> 00:15:25.610 So we take squares of these  estimate and standard error,  00:15:25.610 --> 00:15:28.970 which gives the variance and that  gives us the chi-square statistic. 00:15:28.970 --> 00:15:34.640 So the logic is that the instead of comparing  just one statistic against a normal distribution,  00:15:34.640 --> 00:15:40.970 we compare the sum of squares estimate of the, sum of squares of two differences,  00:15:40.970 --> 00:15:45.290 against the sum of squares of two  normally distributed variables. 00:15:45.290 --> 00:15:50.780 If our it's plausible that a random  process of two normally distributed   00:15:50.780 --> 00:15:56.630 variables would have produced the same distance, then we conclude that could be by chance only.