WEBVTT Kind: captions Language: en 00:00:00.120 --> 00:00:03.087 It's fairly common that we have both, 00:00:03.087 --> 00:00:06.240 interaction effects and nonlinear  effects from log transformation 00:00:06.240 --> 00:00:07.350 in the same model. 00:00:07.780 --> 00:00:10.470 The interpretation of these effects is done 00:00:10.470 --> 00:00:13.650 by plotting as well, and you  need to take that into account, 00:00:13.650 --> 00:00:15.060 when you construct the plot. 00:00:15.665 --> 00:00:18.300 So let's first review the log transformation. 00:00:18.300 --> 00:00:22.020 The idea of log transformation  is that we take a log of 00:00:22.020 --> 00:00:23.613 either our dependent variable, 00:00:23.613 --> 00:00:26.433 or any of the independent variables, 00:00:26.433 --> 00:00:30.698 and that changes the interpretation  of that variable to relative units. 00:00:30.698 --> 00:00:35.970 For example, if we're saying that the  prestige depends on a log of income, 00:00:35.970 --> 00:00:40.080 then the interpretation of beta2 here would be, 00:00:40.080 --> 00:00:45.390 how much prestige will  increase if income increases 1% 00:00:45.390 --> 00:00:46.770 relative to the current level. 00:00:47.227 --> 00:00:53.370 So we're talking about relative  effects of income changes to prestige. 00:00:53.639 --> 00:00:55.709 We can also do it the other way around, 00:00:55.722 --> 00:00:58.812 so we have a log of income  as the dependent variable, 00:00:58.812 --> 00:01:01.530 we have prestige as the dependent variable. 00:01:01.826 --> 00:01:03.806 Then the interpretation would be, 00:01:03.806 --> 00:01:07.000 how much income increases  relative to the current level, 00:01:07.000 --> 00:01:10.950 when prestige increases by one point. 00:01:11.703 --> 00:01:16.530 So log transformation makes a lot of  sense for certain kind of variables, 00:01:16.530 --> 00:01:17.940 for example income. 00:01:18.720 --> 00:01:23.132 The races that you get are  usually in relative terms. 00:01:23.132 --> 00:01:26.790 And if you think, what's the  utility of each additional euro, 00:01:26.790 --> 00:01:28.957 it diminishes as your salary goes up, 00:01:28.957 --> 00:01:31.003 so you need more races. 00:01:31.003 --> 00:01:33.136 If you have a thousand euros per month, 00:01:33.136 --> 00:01:36.690 then adding a 1000 more is a huge effect. 00:01:36.690 --> 00:01:39.717 If you have a 5,000 euros per month salary, 00:01:39.717 --> 00:01:42.810 then increasing that by a thousand euros, 00:01:42.810 --> 00:01:43.689 it's a lot, 00:01:43.689 --> 00:01:47.340 but it's not as huge difference as for somebody, 00:01:47.340 --> 00:01:49.260 who makes just a thousand euros per month. 00:01:49.260 --> 00:01:52.590 So relative effects are done  with log transformation. 00:01:52.590 --> 00:01:55.140 So how do you combine these  with interaction effects? 00:01:55.759 --> 00:02:02.102 This is the model estimated with Stata. 00:02:02.102 --> 00:02:06.822 So we have prestige and women, 00:02:06.822 --> 00:02:10.050 we have education, prestige and percentages women, 00:02:10.050 --> 00:02:13.350 we have income and log of income  as the dependent variables. 00:02:13.726 --> 00:02:16.426 So we know this far that, 00:02:16.695 --> 00:02:19.499 interpreting this model requires that you plot. 00:02:19.539 --> 00:02:21.279 So you calculate, 00:02:21.279 --> 00:02:24.480 what is the fitted value for prestige, 00:02:24.601 --> 00:02:27.421 for income as a function of prestige, 00:02:27.529 --> 00:02:29.089 holding education at the mean, 00:02:29.089 --> 00:02:32.640 and comparing different  levels of percentage woman. 00:02:32.640 --> 00:02:36.690 So we could calculate the marginal  prediction of percentage woman is 0, 00:02:36.690 --> 00:02:40.230 50 and 100, holding education at the mean, 00:02:40.230 --> 00:02:43.365 and varying the prestige. 00:02:44.858 --> 00:02:48.690 The log transformation here  complicates things a bit, 00:02:48.690 --> 00:02:49.980 but not by much. 00:02:50.101 --> 00:02:53.461 So instead of calculating  the predictions directly, 00:02:53.636 --> 00:02:57.510 we calculate predictions using  the exact same procedure, 00:02:57.510 --> 00:02:59.970 and then we just take  exponential of those predictions. 00:03:00.373 --> 00:03:02.413 So instead of predicting lines, 00:03:02.413 --> 00:03:05.699 we predict a line and then we  take an exponential of that line. 00:03:05.860 --> 00:03:07.690 How does it, it looks like that. 00:03:07.920 --> 00:03:09.505 This is from Stata again, 00:03:09.505 --> 00:03:11.275 margins plot command. 00:03:11.275 --> 00:03:13.860 And we have linear effects here, 00:03:14.680 --> 00:03:17.760 and we have curvilinear effects here. 00:03:17.760 --> 00:03:19.470 So these are relative effects. 00:03:19.470 --> 00:03:24.270 We have the effect of increasing  prestige on income for 00:03:24.270 --> 00:03:29.276 male-dominated professions and  women-dominated the professors, 00:03:29.652 --> 00:03:32.640 and here we have the same effects with lines. 00:03:32.640 --> 00:03:35.310 As you can see the interpretations  are quite different. 00:03:35.310 --> 00:03:39.630 So here women get no income at  all as a function of prestige, 00:03:39.630 --> 00:03:40.712 or no increase at all. 00:03:40.927 --> 00:03:43.447 Here they get a relative increase, 00:03:43.447 --> 00:03:47.130 but the absolute increase is less  than for men dominated professions. 00:03:47.130 --> 00:03:48.510 How do we know, 00:03:48.510 --> 00:03:49.560 which one of these lines, 00:03:49.560 --> 00:03:52.500 set of three lines, fits the data best? 00:03:52.917 --> 00:03:59.610 We can do that by simply adding  observations to this plot. 00:03:59.610 --> 00:04:01.470 So we can have plots like that. 00:04:01.900 --> 00:04:07.710 And each circle here is one profession. 00:04:07.710 --> 00:04:10.650 So we have prestige for that professor, 00:04:10.650 --> 00:04:12.750 and we have income for that profession. 00:04:13.500 --> 00:04:16.050 The size of the circle  presents the number of women. 00:04:16.050 --> 00:04:20.010 The smallest circles are no  women in that profession, 00:04:20.010 --> 00:04:22.860 the largest circles are all  women in that profession. 00:04:22.860 --> 00:04:26.262 And here we can see that these sets of lines, 00:04:26.262 --> 00:04:29.112 this set of lines explains the data a lot better. 00:04:29.112 --> 00:04:30.794 Because here, for example, 00:04:30.794 --> 00:04:32.165 there are no observations here. 00:04:32.179 --> 00:04:33.679 So we are extrapolating here, 00:04:33.679 --> 00:04:34.972 so it doesn't really fit. 00:04:34.972 --> 00:04:38.490 And these are way too up for this line, 00:04:38.490 --> 00:04:40.500 and particularly if you look  at the confidence intervals, 00:04:40.500 --> 00:04:41.760 or prediction intervals. 00:04:41.841 --> 00:04:43.011 Then we have here, 00:04:43.011 --> 00:04:45.611 we can see the prediction  intervals here are large, 00:04:45.611 --> 00:04:49.379 which means that some of the  observations can be up here, 00:04:49.379 --> 00:04:52.830 and also we have no observations here. 00:04:53.556 --> 00:04:57.073 So one way of ruling out 00:04:57.289 --> 00:05:02.430 outlier as an explanation for lines or assessing, 00:05:02.430 --> 00:05:05.451 which set of lines explains the data better, 00:05:05.451 --> 00:05:09.030 is to just plot the data and  the lines in the same plot. 00:05:09.030 --> 00:05:10.350 And that allows you to compare.