WEBVTT

WEBVTT
Kind: captions
Language: en

00:00:00.030 --> 00:00:02.280
Arguing that your data are reliable and&nbsp;&nbsp;

00:00:02.280 --> 00:00:05.700
valid measures of the constructs in&nbsp;
your theory is a challenging task.

00:00:05.700 --> 00:00:10.470
In this video I will look at the&nbsp;
link between theory and data and&nbsp;&nbsp;

00:00:10.470 --> 00:00:13.140
how that link is built in empirical papers.

00:00:13.140 --> 00:00:20.070
The idea of the link between theory and&nbsp;
data is that data is something that we&nbsp;&nbsp;

00:00:20.070 --> 00:00:24.960
observe and then quite far from the data&nbsp;
is the theoretical concept. And we have&nbsp;&nbsp;

00:00:24.960 --> 00:00:31.530
to somehow argue that the data are related&nbsp;
to the theory. If the data are unrelated&nbsp;&nbsp;

00:00:31.530 --> 00:00:37.170
the theory then we cannot claim that the&nbsp;
data would allow us to test the theory.

00:00:37.170 --> 00:00:44.250
So what exactly is the nature of this link is&nbsp;
something that your study needs to address. One&nbsp;&nbsp;

00:00:44.250 --> 00:00:49.920
way to think about this issue is to introduce&nbsp;
the concept of an empirical concept between&nbsp;&nbsp;

00:00:49.920 --> 00:00:54.960
the theoretical concept and the actual&nbsp;
measurement result which is your data.

00:00:54.960 --> 00:01:00.120
The idea of an empirical concept&nbsp;
is that is a lower level concept&nbsp;&nbsp;

00:01:00.120 --> 00:01:04.440
than your theoretical concept and&nbsp;
it allows you to actually collect&nbsp;&nbsp;

00:01:04.440 --> 00:01:08.190
some data. So let's take a look at&nbsp;
how that approach works in practice.

00:01:08.190 --> 00:01:16.230
We need an example and I'm going to use this&nbsp;
example that I have used in the past. In 2005&nbsp;&nbsp;

00:01:16.230 --> 00:01:23.040
there are the largest 500 Finnish companies.&nbsp;
There is a finding that the women lead companies&nbsp;&nbsp;

00:01:23.040 --> 00:01:28.740
were four point seven percentage points more&nbsp;
profitable than main lead companies and we&nbsp;&nbsp;

00:01:28.740 --> 00:01:33.960
want to make a claim that naming a woman as&nbsp;
a CEO causes the profitability to increase.

00:01:33.960 --> 00:01:39.330
So our theoretical concept here&nbsp;
is the CEO gender and second&nbsp;&nbsp;

00:01:39.330 --> 00:01:44.490
theoretical concept is profitability&nbsp;
or performance. Then we have to figure&nbsp;&nbsp;

00:01:44.490 --> 00:01:49.470
out how exactly we link those two&nbsp;
theoretical concepts to the data.

00:01:49.470 --> 00:01:54.390
How it works is that we introduce the&nbsp;
empirical concept and we have been using&nbsp;&nbsp;

00:01:54.390 --> 00:02:01.710
this diagram before when we were discussing about&nbsp;
inductive and deductive logic. The idea was that&nbsp;&nbsp;

00:02:01.710 --> 00:02:07.890
we start with a theoretical proposition.&nbsp;
Then from the theoretical proposition we&nbsp;&nbsp;

00:02:07.890 --> 00:02:12.630
derive a testable hypothesis that is on&nbsp;
a lower level of abstraction. Then we&nbsp;&nbsp;

00:02:12.630 --> 00:02:19.810
collect some data and we test for statistical&nbsp;
Association which allows us to make claims&nbsp;&nbsp;

00:02:19.810 --> 00:02:23.830
about the correctness of the hypothesis&nbsp;
and then correctness of the hypothesis.

00:02:23.830 --> 00:02:29.500
The idea was that we apply deductive logic&nbsp;
so that if the proposition is correct then&nbsp;&nbsp;

00:02:29.500 --> 00:02:33.310
the hypothesis should be observed&nbsp;
and then we check if we actually&nbsp;&nbsp;

00:02:33.310 --> 00:02:37.000
do observe by calculating something&nbsp;
based on our measurement results.

00:02:37.000 --> 00:02:42.220
Our focus this far has been on&nbsp;
the proposition hypothesis and&nbsp;&nbsp;

00:02:42.220 --> 00:02:47.350
statistical Association and we haven't&nbsp;
really discussed much about these arrows&nbsp;&nbsp;

00:02:47.350 --> 00:02:54.760
here. So ow we're going to be looking at&nbsp;
specifically what these two arrows here mean.

00:02:54.760 --> 00:03:01.540
And let's go back to our example. So the first&nbsp;
concept was CEO gender and we need to have an&nbsp;&nbsp;

00:03:01.540 --> 00:03:08.530
empirical concept that we can actually collect&nbsp;
data for. or example if the gender of the CEO&nbsp;&nbsp;

00:03:08.530 --> 00:03:15.010
is theoretical concept we could have the result&nbsp;
of a medical examination as an empirical concept&nbsp;&nbsp;

00:03:15.010 --> 00:03:19.420
that is something that we can observe data&nbsp;
for but that's not a practical solution. In&nbsp;&nbsp;

00:03:19.420 --> 00:03:25.990
practice we can just use our empirical concept&nbsp;
or we can define it us whether the CEOs first&nbsp;&nbsp;

00:03:25.990 --> 00:03:32.140
name is a man's name or a woman's name. That of&nbsp;
course could have some reliability or validity&nbsp;&nbsp;

00:03:32.140 --> 00:03:39.700
problems because we may not be able to know&nbsp;
for sure that a name indicates a woman because&nbsp;&nbsp;

00:03:39.700 --> 00:03:45.580
some names are used for both genders. Then&nbsp;
we have specific names for specific CEOs.

00:03:45.580 --> 00:03:50.530
The same thing here we need to have a concept. We&nbsp;&nbsp;

00:03:50.530 --> 00:03:55.660
have the performance that's the dependent&nbsp;
variable theoretical variable. ROA is the&nbsp;&nbsp;

00:03:55.660 --> 00:04:00.490
empirical concept here in the example and&nbsp;
then we have ROA data for specific firms.

00:04:00.490 --> 00:04:06.010
Now the question is how do we justify&nbsp;
these relationships? How do we justify&nbsp;&nbsp;

00:04:06.010 --> 00:04:11.890
that whether the CEO's name is a&nbsp;
man's name is a reliable and valid&nbsp;&nbsp;

00:04:11.890 --> 00:04:17.920
measure of the theoretical concept?&nbsp;
How do we justify here that ROA is&nbsp;&nbsp;

00:04:17.920 --> 00:04:23.590
a valid performance measure and how do&nbsp;
we justify that our data is reliable?

00:04:23.590 --> 00:04:32.080
Let's take a look at ROA. So why would ROA&nbsp;
be a valid and reliable measure. We have&nbsp;&nbsp;

00:04:32.080 --> 00:04:37.270
to first understand what is reliability and&nbsp;
what is valid it here. Reliability here in&nbsp;&nbsp;

00:04:37.270 --> 00:04:45.070
this figure is here betweenreturn on assets&nbsp;
the conceptual definition of the empirical&nbsp;&nbsp;

00:04:45.070 --> 00:04:50.410
concept and the actual data. So do we get the&nbsp;
same data again if we collect the same data&nbsp;&nbsp;

00:04:50.410 --> 00:04:55.270
for the same sample. With ROA, because it's an&nbsp;
accounting figure that comes from a database,&nbsp;&nbsp;

00:04:55.270 --> 00:05:01.060
we concluded it is probably highly&nbsp;
reliable. So reliability is here&nbsp;&nbsp;

00:05:01.060 --> 00:05:06.070
and then validity on the other hand&nbsp;
is a much more challenging question.

00:05:06.070 --> 00:05:11.740
Can we claim that return on assets is&nbsp;
actually a valid measure of performance&nbsp;&nbsp;

00:05:11.740 --> 00:05:21.520
and how do we do that? Reliability is fairly&nbsp;
simple to argue. So the simplest way would&nbsp;&nbsp;

00:05:21.520 --> 00:05:25.810
be just to measure the same thing again.&nbsp;
Demonstrate that you get the same result&nbsp;&nbsp;

00:05:25.810 --> 00:05:31.780
then it's reliable. So reliability is not about&nbsp;
whether the variable actually measures what it&nbsp;&nbsp;

00:05:31.780 --> 00:05:35.770
is supposed to measure. It's simply that&nbsp;
if we do the study again would we get the&nbsp;&nbsp;

00:05:35.770 --> 00:05:39.670
same result. Doing the study again doing the&nbsp;
measurement again is a simple way of doing it.

00:05:39.670 --> 00:05:46.180
Validity on the other hand - we have to argue&nbsp;
that return on assets is a valid performance&nbsp;&nbsp;

00:05:46.180 --> 00:05:51.100
measure. So how exactly do we do that? there&nbsp;
are a couple of different strategies but this&nbsp;&nbsp;

00:05:51.100 --> 00:05:56.140
is non-statistical argument so it's an argument&nbsp;
based on theory and based in our understanding&nbsp;&nbsp;

00:05:56.140 --> 00:06:02.560
of the phenomenon. For example we could argue&nbsp;
that ROA return on assets is a valid measure&nbsp;&nbsp;

00:06:02.560 --> 00:06:09.220
of performance because that is a performance&nbsp;
measure that investors and managers care about.

00:06:09.220 --> 00:06:15.190
So if it's a relevant measure for&nbsp;
investors and managers who we hope&nbsp;&nbsp;

00:06:15.190 --> 00:06:20.860
to inform with our study then it's a&nbsp;
valid measure. That's one way. Another&nbsp;&nbsp;

00:06:20.860 --> 00:06:25.300
way of thinking about is that the purpose&nbsp;
of the company is tho general profits and&nbsp;&nbsp;

00:06:25.300 --> 00:06:28.210
earn money for the owner so that's the&nbsp;
purpose of a business organization.

00:06:28.210 --> 00:06:37.060
And then our return on assets is a function&nbsp;
of that money generated divided by the money&nbsp;&nbsp;

00:06:37.060 --> 00:06:43.840
invested in terms of assets. So it's kind&nbsp;
of like a way of standardizing taking into&nbsp;&nbsp;

00:06:43.840 --> 00:06:49.630
account that companies of different size&nbsp;
produce different amount of results. So&nbsp;&nbsp;

00:06:49.630 --> 00:06:54.820
it's scales the ultimate output which is the&nbsp;
profits based on the company size. So that&nbsp;&nbsp;

00:06:54.820 --> 00:07:01.090
would be an argument for ROA as well. But&nbsp;
this is not a statistical argument. It's an&nbsp;&nbsp;

00:07:01.090 --> 00:07:08.920
argument that this is a relevant metric&nbsp;
and it's based on either that we have a&nbsp;&nbsp;

00:07:08.920 --> 00:07:15.160
theoretical understanding what is the purpose of&nbsp;
the organization then we say that this reflects&nbsp;&nbsp;

00:07:15.160 --> 00:07:21.970
a purpose or it could be made by arguing that&nbsp;
that's a relevant variable for practitioners.

00:07:21.970 --> 00:07:28.720
Either way it's a substantive instead&nbsp;
of methodological argument. So this is&nbsp;&nbsp;

00:07:28.720 --> 00:07:34.420
a statistical problem reliability and this is&nbsp;
the theoretical and a philosophical problem.&nbsp;&nbsp;

00:07:34.420 --> 00:07:39.790
So it relates to really is this irrelevant for&nbsp;
the readers of your audience and your theory.

00:07:39.790 --> 00:07:47.230
So most researchers when we do research&nbsp;
we apply the empirical concept as a proxy&nbsp;&nbsp;

00:07:47.230 --> 00:07:53.620
and in practice that means that we simply&nbsp;
assume that the empirical concept is equal&nbsp;&nbsp;

00:07:53.620 --> 00:07:57.910
to the theoretical concept. So once we have&nbsp;
argued that this empirical concept has some&nbsp;&nbsp;

00:07:57.910 --> 00:08:04.420
relevance for the theory then we use it as&nbsp;
a substitute or a proxy for the theoretical&nbsp;&nbsp;

00:08:04.420 --> 00:08:09.880
concept. The reason for that is that we really&nbsp;
cannot measure a theoretical concept directly&nbsp;&nbsp;

00:08:09.880 --> 00:08:14.290
so using this empirical concept as a proxy&nbsp;
it's the best thing that we can actually do.

00:08:14.290 --> 00:08:23.590
Let's take a look at how Deephouse paper does this&nbsp;
kind of all thinking. So they had a proposition&nbsp;&nbsp;

00:08:23.590 --> 00:08:30.580
about statistical similarity and performance. Then&nbsp;
they are using relative ROA as their performance&nbsp;&nbsp;

00:08:30.580 --> 00:08:39.160
measure the empirical concept and stability&nbsp;
deviation as empirical concept measuring&nbsp;&nbsp;

00:08:39.160 --> 00:08:45.640
strategy similarity and then they had some data&nbsp;
that they used for to calculate this result.

00:08:45.640 --> 00:08:54.250
How do we argue that strategic deviation is a&nbsp;
valid measure of strategy similarity? Simply&nbsp;&nbsp;

00:08:54.250 --> 00:08:59.470
the fact it's labeled similarly to strategic&nbsp;
similarity doesn't really mean anything.

00:08:59.470 --> 00:09:05.470
The fact that we decide to label something&nbsp;
doesn't give it a meaning. So that is&nbsp;&nbsp;

00:09:05.470 --> 00:09:11.230
called the nominalist fallacy. If we claim&nbsp;
that just because we decided to name this&nbsp;&nbsp;

00:09:11.230 --> 00:09:15.310
strategic similarity it must be a measure&nbsp;
of the similarity is not a valid argument.

00:09:15.310 --> 00:09:23.530
So how do we justify in. We talked about ROA in&nbsp;
in the last slide so that's simple. Strategic&nbsp;&nbsp;

00:09:23.530 --> 00:09:31.870
similarity their argument is basically that&nbsp;
which asset categories behold is one of the most&nbsp;&nbsp;

00:09:31.870 --> 00:09:39.700
important strategic decisions of commercial banks.&nbsp;
So that's the argument for why they take these&nbsp;&nbsp;

00:09:39.700 --> 00:09:45.280
different asset categories into consideration.&nbsp;
Then they claim that previous research has&nbsp;&nbsp;

00:09:45.280 --> 00:09:52.270
summarized these different asset categories&nbsp;
that they use for calculating deviation in&nbsp;&nbsp;

00:09:52.270 --> 00:09:57.340
a certain way and they use the same approach&nbsp;
and they use other study for justification.

00:09:57.340 --> 00:10:03.670
So the way you argue for validity there are&nbsp;
a couple of different ways. You have to first&nbsp;&nbsp;

00:10:03.670 --> 00:10:11.200
explain the relevance of the variables or the data&nbsp;
for your theory. In this case asset categories&nbsp;&nbsp;

00:10:11.200 --> 00:10:16.210
are relevant for banks and then the actual&nbsp;
measurement approach you either have to justified&nbsp;&nbsp;

00:10:16.210 --> 00:10:22.840
yourself or you can say that others have used this&nbsp;
approach and others have provided justification.

00:10:22.840 --> 00:10:28.060
If you do that you must be careful that you&nbsp;
actually check that the paper that your site&nbsp;&nbsp;

00:10:28.060 --> 00:10:33.250
provides a justification because sometimes&nbsp;
researchers use completely unjustified&nbsp;&nbsp;

00:10:33.250 --> 00:10:39.430
measures and just the fact that something has been&nbsp;
published with the measurement approach doesn't&nbsp;&nbsp;

00:10:39.430 --> 00:10:43.570
make that measurement approach necessarily valid.&nbsp;
So you have to look at the actual validity of&nbsp;&nbsp;

00:10:43.570 --> 00:10:49.930
claims and validity evidence in published studies&nbsp;
when you decide which measurement approach to use.