WEBVTT 00:00:00.030 --> 00:00:04.890 After we have established that there is a  statistical association in the population.  00:00:04.890 --> 00:00:08.628 The next step in research is typically causal inference. 00:00:08.628 --> 00:00:18.618 So we want to say that there is actually a variable x that causes a variable y, instead of a mere statistical associates.  00:00:18.618 --> 00:00:22.475 Let's go back to our example of the Talouselämä 500 list 00:00:22.475 --> 00:00:25.953 and the difference between men and women led companies. 00:00:25.953 --> 00:00:35.255 And let's assume now that we want to make the claim that naming a woman as a CEO causes the profitability to increase. 00:00:35.255 --> 00:00:40.783 So we can attribute this profitability difference to the women CEOs. 00:00:40.783 --> 00:00:44.519 Now, why would these kinds of causal claims be important? 00:00:44.519 --> 00:00:50.593 There are two reasons: first of all causal claims allow us to make policy recommendations, for example, 00:00:50.593 --> 00:01:02.017 if we can make a valid causal claim, then we can make claims that, we should increase women as CEOs. 00:01:02.017 --> 00:01:05.721 So with more woman CEOs, we can make  that kind of recommendations. 00:01:05.721 --> 00:01:11.444 Another important reason for making causal claims is that, if we don't make 00:01:11.444 --> 00:01:16.853 a causal claim, then someone else will interpret our results causally. 00:01:16.853 --> 00:01:21.148 So when this difference was published originally in 2005, 00:01:21.148 --> 00:01:26.433 there were many discussions in online, on various newspapers 00:01:26.433 --> 00:01:31.744 and whether we should, based on this result, nominate more women as CEOs. 00:01:31.744 --> 00:01:40.800 Let's take another example, there's a report that women lead companies are more profitable, it's not a unique observation, this particular study. 00:01:40.800 --> 00:01:47.340 Here's a report by McKinsey, they showed that there's a difference between men- and women-led companies,   00:01:47.340 --> 00:01:50.298 so women-led companies are more profitable. 00:01:50.298 --> 00:01:59.513 And then they say that while this profitability difference doesn't allow us to make a causal claim, 00:01:59.513 --> 00:02:07.113 they nevertheless think that there should be a policy recommendation, that we have more women on boards or as CEOs. 00:02:07.113 --> 00:02:08.607 So what do you make of that? 00:02:08.607 --> 00:02:19.556 When someone reads that kind of claim that we can't make causal claim, but nevertheless, we think that it could be a good thing to have more women leading companies. 00:02:19.556 --> 00:02:26.490 Of course people will interpret that in order to improve your financial performance, nominate women to a CEO position. 00:02:26.490 --> 00:02:30.223 So people will make causal interpretation of your data. 00:02:30.223 --> 00:02:38.698 So you have to either make interpretations yourself to guarantee that it's valid, or you have to explicitly caution 00:02:38.698 --> 00:02:46.383 that it's not a causal relationship, and you should refrain from making any policy implications, like the McKinsey people did. 00:02:46.383 --> 00:02:49.779 So how do we make a causal claim then? 00:02:49.779 --> 00:02:59.010 We have identified it that there is a difference of 4.7% point and let's say that we have some way identifying that can be by chance only.   00:02:59.010 --> 00:03:05.550 So there is a consistent association that women-led companies are more profitable than men-led  companies. 00:03:05.550 --> 00:03:09.692 How do we know that it's a causal  effect? 00:03:09.692 --> 00:03:13.946 We have to ask the question of why is there a difference? 00:03:13.946 --> 00:03:21.000 We need different explanations to rule out different theories, to rule out alternative explanations. 00:03:21.000 --> 00:03:28.334 There's a reason for the correlation in the data, we just have to have to discover, what is the reason. 00:03:28.334 --> 00:03:34.795 So is it because women-led companies are more profitable, because of the CEO gender, 00:03:34.795 --> 00:03:43.324 or is there some other reason, that certain companies tend to be led by women, and certain companies tend to be more profitable. 00:03:43.324 --> 00:03:45.913 To do that, we need a theory. 00:03:45.913 --> 00:03:55.451 So the theory was a set of connective propositions or claims, that explain, what happens, how, when and why. 00:03:56.293 --> 00:04:04.825 So the important part in this example is that, why are the return on assets between the men- and women-led companies different? 00:04:04.825 --> 00:04:06.925 We need to have the why question. 00:04:06.925 --> 00:04:16.448 And a big part of doing quantitative research is to think, what kind of rival or  alternative explanations we have for our data. 00:04:16.448 --> 00:04:19.645 We have, for example, these explanations. 00:04:19.645 --> 00:04:29.137 We could say that woman as a CEO causes firm performance, the first explanation, but it's not a direct effect,  00:04:29.137 --> 00:04:33.452 rather it's that women facilitate top management team work 00:04:33.452 --> 00:04:38.174 and better top management team work leads to firm performance. 00:04:38.174 --> 00:04:45.512 Or it could be that smaller companies are more profitable, and smaller companies are more likely to hire women. 00:04:45.512 --> 00:04:48.263 That would be an example of a spurious  relationship. 00:04:48.263 --> 00:04:54.000 Or certain industries are more profitable, certain industries are more likely to  hire women. 00:04:54.000 --> 00:04:59.656 For example, if we look at return on assets, mining industry, they have large assets.  00:04:59.656 --> 00:05:05.883 So the return on assets in that industry is pretty low compared to the mean of all industries.  00:05:05.883 --> 00:05:11.280 Then mining companies are more likely to be run by men than women. 00:05:11.280 --> 00:05:15.921 So that would be a reason to suspect that there's a spurious correlation. 00:05:15.921 --> 00:05:23.789 Or it could be a reverse causation, so we could say that because the company is profitable, they can afford to hire a woman. 00:05:23.789 --> 00:05:33.867 Or it could be that the CEO gender, the women are better CEOs and that influences company performance. 00:05:33.867 --> 00:05:45.134 Why this kind of argument would make sense is because women are still discriminated against in a CEO decisions. 00:05:45.134 --> 00:05:49.970 Only 22 out of 500 companies were led by a woman. 00:05:49.970 --> 00:05:57.810 That means, the last woman or the worst woman, who gets to be a CEO in that sample is likely 00:05:57.810 --> 00:06:04.920 to be a lot better CEO than the last man, because there are so many more men in the sample. 00:06:04.920 --> 00:06:09.332 So that would be, that it's not actually that women are better or there's 00:06:09.332 --> 00:06:14.753 something about being a woman that causes the company to be better. 00:06:14.753 --> 00:06:19.337 But it's a selection factor, that's also a plausible alternative explanation. 00:06:19.337 --> 00:06:23.670 Then we need to consider, which ones of these are the most relevant. 00:06:23.670 --> 00:06:28.950 So because we need to collect additional data, we need to have a variable for the CEO gender,   00:06:28.950 --> 00:06:33.418 we need the variable for the profitability, and then what else?  00:06:33.418 --> 00:06:37.065 We need to collect data to rule out these alternative explanations and 00:06:37.065 --> 00:06:40.940 we would at least need the industries, because that's easy to get. 00:06:40.940 --> 00:06:44.233 We would need the company sizes, that we need to get as well. 00:06:44.233 --> 00:06:48.977 These are top management, team performance, that's more difficult to get. 00:06:48.977 --> 00:06:50.703 Skills are more difficult to get. 00:06:50.703 --> 00:06:56.209 So it's a trade-off of what is easily available and what we actually need. 00:06:56.209 --> 00:06:59.620 Then we start ruling out these alternative explanations. 00:06:59.620 --> 00:07:03.546 So we have to consider now three conditions for causality. 00:07:03.546 --> 00:07:13.216 And we can make a causal claim, by showing that there's a statistical association between the course X and the affect Y. 00:07:13.521 --> 00:07:15.218 That's the first step. 00:07:15.218 --> 00:07:20.617 The association may not be a correlation, it could be some other kind of association, but there must be an association. 00:07:20.617 --> 00:07:26.042 If cause and effect don't depend on one another, we can't make a causal claim. 00:07:26.042 --> 00:07:29.023 Then we would have to show that this direction of influence, 00:07:29.023 --> 00:07:35.917 so that the X, the cause always comes before the Y effect and not the other way around. 00:07:35.917 --> 00:07:39.369 And then we have elimination of rival explanations. 00:07:39.369 --> 00:07:45.162 So, how do we rule out the possibility of this correlation being an industry effect, 00:07:45.162 --> 00:07:50.326 that the influence is the CEO selection decisions, and also influences profitability. 00:07:50.326 --> 00:07:53.481 How do we know that it's not firm  size effect? 00:07:54.370 --> 00:07:57.189 There is a very simple strategy for ruling out the direction. 00:07:57.189 --> 00:08:01.946 And that is, we just measure the cause before the effect. 00:08:01.946 --> 00:08:07.709 If we measure the CEO gender now and profitability the next year, it's impossible to 00:08:07.709 --> 00:08:15.778 say that, profitability in the future causes the company to choose women CEO now. 00:08:15.778 --> 00:08:22.215 Of course there could be some profitability exertations that influence that, but that's a different thing. 00:08:22.215 --> 00:08:24.880 So we measure the cause before the effect. 00:08:24.880 --> 00:08:28.355 Elimination tribal explanation is the hard part. 00:08:28.355 --> 00:08:36.302 We have two empirical strategies for that, and one is randomized assignment and  do an experiment. 00:08:36.302 --> 00:08:46.166 In this case, we would take the 500 companies, randomly assign a randomly chosen Finnish men, to half of the companies. 00:08:46.166 --> 00:08:51.388 Randomly assign randomly chosen Finnish women to another half of the companies. 00:08:51.388 --> 00:08:55.234 To see which half is more profitable two years from now. 00:08:55.234 --> 00:09:01.239 That's of course impractical to do, but that's the experimental way. 00:09:01.239 --> 00:09:08.192 We manipulate the independent variable and then we observe the dependent variable after a delay. 00:09:08.192 --> 00:09:17.252 In practice, in business research, we do statistical modelling or controlling for alternative explanations.  00:09:17.252 --> 00:09:22.860 We say that the company profitability is a function of CEO gender plus   00:09:22.860 --> 00:09:26.683 some other things that could correlate with CEO gender. 00:09:26.683 --> 00:09:31.558 And then we test, which one of those is the strongest predictor of performance, 00:09:31.558 --> 00:09:36.950 taking the other plausable explanations into consideration.