WEBVTT
Kind: captions
Language: en

00:00:00.120 --> 00:00:03.087
It's fairly common that we have both,

00:00:03.087 --> 00:00:06.240
interaction effects and nonlinear 
effects from log transformation

00:00:06.240 --> 00:00:07.350
in the same model.

00:00:07.780 --> 00:00:10.470
The interpretation of these effects is done

00:00:10.470 --> 00:00:13.650
by plotting as well, and you 
need to take that into account,

00:00:13.650 --> 00:00:15.060
when you construct the plot.

00:00:15.665 --> 00:00:18.300
So let's first review the log transformation.

00:00:18.300 --> 00:00:22.020
The idea of log transformation 
is that we take a log of

00:00:22.020 --> 00:00:23.613
either our dependent variable,

00:00:23.613 --> 00:00:26.433
or any of the independent variables,

00:00:26.433 --> 00:00:30.698
and that changes the interpretation 
of that variable to relative units.

00:00:30.698 --> 00:00:35.970
For example, if we're saying that the 
prestige depends on a log of income,

00:00:35.970 --> 00:00:40.080
then the interpretation of beta2 here would be,

00:00:40.080 --> 00:00:45.390
how much prestige will 
increase if income increases 1%

00:00:45.390 --> 00:00:46.770
relative to the current level.

00:00:47.227 --> 00:00:53.370
So we're talking about relative 
effects of income changes to prestige.

00:00:53.639 --> 00:00:55.709
We can also do it the other way around,

00:00:55.722 --> 00:00:58.812
so we have a log of income 
as the dependent variable,

00:00:58.812 --> 00:01:01.530
we have prestige as the dependent variable.

00:01:01.826 --> 00:01:03.806
Then the interpretation would be,

00:01:03.806 --> 00:01:07.000
how much income increases 
relative to the current level,

00:01:07.000 --> 00:01:10.950
when prestige increases by one point.

00:01:11.703 --> 00:01:16.530
So log transformation makes a lot of 
sense for certain kind of variables,

00:01:16.530 --> 00:01:17.940
for example income.

00:01:18.720 --> 00:01:23.132
The races that you get are 
usually in relative terms.

00:01:23.132 --> 00:01:26.790
And if you think, what's the 
utility of each additional euro,

00:01:26.790 --> 00:01:28.957
it diminishes as your salary goes up,

00:01:28.957 --> 00:01:31.003
so you need more races.

00:01:31.003 --> 00:01:33.136
If you have a thousand euros per month,

00:01:33.136 --> 00:01:36.690
then adding a 1000 more is a huge effect.

00:01:36.690 --> 00:01:39.717
If you have a 5,000 euros per month salary,

00:01:39.717 --> 00:01:42.810
then increasing that by a thousand euros,

00:01:42.810 --> 00:01:43.689
it's a lot,

00:01:43.689 --> 00:01:47.340
but it's not as huge difference as for somebody,

00:01:47.340 --> 00:01:49.260
who makes just a thousand euros per month.

00:01:49.260 --> 00:01:52.590
So relative effects are done 
with log transformation.

00:01:52.590 --> 00:01:55.140
So how do you combine these 
with interaction effects?

00:01:55.759 --> 00:02:02.102
This is the model estimated with Stata.

00:02:02.102 --> 00:02:06.822
So we have prestige and women,

00:02:06.822 --> 00:02:10.050
we have education, prestige and percentages women,

00:02:10.050 --> 00:02:13.350
we have income and log of income 
as the dependent variables.

00:02:13.726 --> 00:02:16.426
So we know this far that,

00:02:16.695 --> 00:02:19.499
interpreting this model requires that you plot.

00:02:19.539 --> 00:02:21.279
So you calculate,

00:02:21.279 --> 00:02:24.480
what is the fitted value for prestige,

00:02:24.601 --> 00:02:27.421
for income as a function of prestige,

00:02:27.529 --> 00:02:29.089
holding education at the mean,

00:02:29.089 --> 00:02:32.640
and comparing different 
levels of percentage woman.

00:02:32.640 --> 00:02:36.690
So we could calculate the marginal 
prediction of  percentage woman is 0,

00:02:36.690 --> 00:02:40.230
50 and 100, holding education at the mean,

00:02:40.230 --> 00:02:43.365
and varying the prestige.

00:02:44.858 --> 00:02:48.690
The log transformation here 
complicates things a bit,

00:02:48.690 --> 00:02:49.980
but not by much.

00:02:50.101 --> 00:02:53.461
So instead of calculating 
the predictions directly,

00:02:53.636 --> 00:02:57.510
we calculate predictions using 
the exact same procedure,

00:02:57.510 --> 00:02:59.970
and then we just take 
exponential of those predictions.

00:03:00.373 --> 00:03:02.413
So instead of predicting lines,

00:03:02.413 --> 00:03:05.699
we predict a line and then we 
take an exponential of that line.

00:03:05.860 --> 00:03:07.690
How does it, it looks like that.

00:03:07.920 --> 00:03:09.505
This is from Stata again,

00:03:09.505 --> 00:03:11.275
margins plot command.

00:03:11.275 --> 00:03:13.860
And we have linear effects here,

00:03:14.680 --> 00:03:17.760
and we have curvilinear effects here.

00:03:17.760 --> 00:03:19.470
So these are relative effects.

00:03:19.470 --> 00:03:24.270
We have the effect of increasing 
prestige on income for

00:03:24.270 --> 00:03:29.276
male-dominated professions and 
women-dominated the professors,

00:03:29.652 --> 00:03:32.640
and here we have the same effects with lines.

00:03:32.640 --> 00:03:35.310
As you can see the interpretations 
are quite different.

00:03:35.310 --> 00:03:39.630
So here women get no income at 
all as a function of prestige,

00:03:39.630 --> 00:03:40.712
or no increase at all.

00:03:40.927 --> 00:03:43.447
Here they get a relative increase,

00:03:43.447 --> 00:03:47.130
but the absolute increase is less 
than for men dominated professions.

00:03:47.130 --> 00:03:48.510
How do we know,

00:03:48.510 --> 00:03:49.560
which one of these lines,

00:03:49.560 --> 00:03:52.500
set of three lines, fits the data best?

00:03:52.917 --> 00:03:59.610
We can do that by simply adding 
observations to this plot.

00:03:59.610 --> 00:04:01.470
So we can have plots like that.

00:04:01.900 --> 00:04:07.710
And each circle here is one profession.

00:04:07.710 --> 00:04:10.650
So we have prestige for that professor,

00:04:10.650 --> 00:04:12.750
and we have income for that profession.

00:04:13.500 --> 00:04:16.050
The size of the circle 
presents the number of women.

00:04:16.050 --> 00:04:20.010
The smallest circles are no 
women in that profession,

00:04:20.010 --> 00:04:22.860
the largest circles are all 
women in that profession.

00:04:22.860 --> 00:04:26.262
And here we can see that these sets of lines,

00:04:26.262 --> 00:04:29.112
this set of lines explains the data a lot better.

00:04:29.112 --> 00:04:30.794
Because here, for example,

00:04:30.794 --> 00:04:32.165
there are no observations here.

00:04:32.179 --> 00:04:33.679
So we are extrapolating here,

00:04:33.679 --> 00:04:34.972
so it doesn't really fit.

00:04:34.972 --> 00:04:38.490
And these are way too up for this line,

00:04:38.490 --> 00:04:40.500
and particularly if you look 
at the confidence intervals,

00:04:40.500 --> 00:04:41.760
or prediction intervals.

00:04:41.841 --> 00:04:43.011
Then we have here,

00:04:43.011 --> 00:04:45.611
we can see the prediction 
intervals here are large,

00:04:45.611 --> 00:04:49.379
which means that some of the 
observations can be up here,

00:04:49.379 --> 00:04:52.830
and also we have no observations here.

00:04:53.556 --> 00:04:57.073
So one way of ruling out

00:04:57.289 --> 00:05:02.430
outlier as an explanation for lines or assessing,

00:05:02.430 --> 00:05:05.451
which set of lines explains the data better,

00:05:05.451 --> 00:05:09.030
is to just plot the data and 
the lines in the same plot.

00:05:09.030 --> 00:05:10.350
And that allows you to compare.