WEBVTT

00:00:00.280 --> 00:00:04.690
Statistical analysis can be used for multiple
different purposes.

00:00:04.690 --> 00:00:09.030
Let's take a look at this example that I'm
going to be using in multiple videos.

00:00:09.030 --> 00:00:12.660
There is this Finnish Business Magazine called
Talouselämä.

00:00:12.660 --> 00:00:19.540
And every year they publish Talouselämä
500 list which lists 500 largest Finnish companies,

00:00:19.540 --> 00:00:24.720
and presents all kinds of analysis of those
companies and how they did for the previous

00:00:24.720 --> 00:00:25.720
year.

00:00:25.720 --> 00:00:31.390
So it's followed by many reporters, and
many people who followed generally Finnish

00:00:31.390 --> 00:00:33.180
business environment.

00:00:33.180 --> 00:00:40.199
In 2005, there was a big headline in one of
the most prestigious Finnish newspapers.

00:00:40.199 --> 00:00:48.159
That on this list, the women-led companies
had 4.7 % points higher return on

00:00:48.159 --> 00:00:53.149
assets than those companies whose CEO was
a man.

00:00:53.149 --> 00:00:55.809
So what can we say, based on this fact.

00:00:55.809 --> 00:01:04.129
We have a 4.7 % point difference,
which is pretty substantial on one variable

00:01:04.129 --> 00:01:06.990
based difference between two groups.

00:01:06.990 --> 00:01:15.640
So the most obvious claim that people want
to make with this kind of number is that naming

00:01:15.640 --> 00:01:19.450
a woman as a CEO causes the profitability
to increase.

00:01:19.450 --> 00:01:23.130
So we have a claim with all kinds of policy
implications.

00:01:23.130 --> 00:01:24.509
But that's not the only claim.

00:01:24.509 --> 00:01:29.219
And it may not be a valid claim that we can
make from this fact, this number.

00:01:29.219 --> 00:01:34.310
So to understand what kind of claims we can
make, generally, let's take a look at three

00:01:34.310 --> 00:01:37.079
purposes of statistics.

00:01:37.079 --> 00:01:40.240
The first purpose, the most simple one is
description.

00:01:40.240 --> 00:01:46.289
So we can just say that women led companies
are more profitable now or in 2005.

00:01:46.289 --> 00:01:48.250
And we don't even try to generalize anywhere.

00:01:48.250 --> 00:01:50.969
So we just state a fact.

00:01:50.969 --> 00:01:54.329
And that kind of description could be useful.

00:01:54.329 --> 00:02:03.509
For example, if one third of students taking
research methods course fail, then that provides

00:02:03.509 --> 00:02:07.010
an indication that there's either
something wrong with the students.

00:02:07.010 --> 00:02:13.700
Or something wrong with the course, even if
we don't try to make any any stronger claims.

00:02:13.700 --> 00:02:21.459
Then the second level of of sophistication
in statistical analysis is prediction.

00:02:21.459 --> 00:02:26.430
So the predictive claim would be that if a
company is led by a woman, then it will be

00:02:26.430 --> 00:02:27.430
more profitable.

00:02:27.430 --> 00:02:29.040
So that's not a causal claim.

00:02:29.040 --> 00:02:34.810
So it's not a claim that the woman is actually
the cause of the profitability difference.

00:02:34.810 --> 00:02:41.959
It is a claim that, if we observe a women-led
company, then for some reason, it is likely

00:02:41.959 --> 00:02:43.500
to be more profitable.

00:02:43.500 --> 00:02:44.500
And prediction is useful.

00:02:44.500 --> 00:02:50.290
For example, if we know that a company is
led by a woman, then it will be more profitable.

00:02:50.290 --> 00:02:54.670
If we know that and others don't, we could
make investment decisions that are better

00:02:54.670 --> 00:02:57.019
than other industries, for example.

00:02:57.019 --> 00:02:59.200
Predictive analytics is very useful.

00:02:59.200 --> 00:03:06.590
We do forecasting and predictions all the
time, you watch weather forecasts, banks forecast,

00:03:06.590 --> 00:03:11.209
or predict who is going to pay their mortgage
on time who's going to be late.

00:03:11.209 --> 00:03:16.209
And stock market, or investors try to forecast
where the stock market goes, and so on.

00:03:16.209 --> 00:03:21.030
So prediction, without any claims about
causality is very useful.

00:03:21.030 --> 00:03:24.730
But that's not very common in quantitative
research.

00:03:24.730 --> 00:03:29.920
Then we have the third step, which is causal
inference.

00:03:29.920 --> 00:03:34.640
So naming a woman as a CEO causes the company
to be more profitable.

00:03:34.640 --> 00:03:37.079
So here, we attribute the difference.

00:03:37.079 --> 00:03:42.370
We're not saying that this is merely a correlational
relationship, we attribute the difference

00:03:42.370 --> 00:03:48.330
in the return on assets to women being CEOs
of some companies and not others.

00:03:48.330 --> 00:03:51.010
And this has clear policy implications.

00:03:51.010 --> 00:03:56.530
If you have a male CEO, then you could increase
their profitability by naming a woman CEO.

00:03:56.530 --> 00:03:58.520
If this claim is true.

00:03:58.520 --> 00:04:02.310
Then we have still a fourth level of claims
that we can make.

00:04:02.310 --> 00:04:07.870
Which goes beyond statistics, and that is
a causal explanation.

00:04:07.870 --> 00:04:11.140
So causal explanation differs from causal
inference.

00:04:11.140 --> 00:04:20.019
In that we don't only make a claim that it's
a woman that causes the company to be more

00:04:20.019 --> 00:04:22.840
profitable, but we'll also explain why that
is the case.

00:04:22.840 --> 00:04:25.190
So that's why it's causal explanation.

00:04:25.190 --> 00:04:32.630
Typically, quantitative analysis can get
us to the causal inference part, but the explanation

00:04:32.630 --> 00:04:34.330
needs to come from somewhere else.

00:04:34.330 --> 00:04:38.780
So we don't generally get to make theory from
numbers.

00:04:38.780 --> 00:04:40.469
We only can make test claims.