WEBVTT

WEBVTT
Kind: captions
Language: en

00:00:00.030 --> 00:00:02.280
Arguing that your data are reliable and  

00:00:02.280 --> 00:00:05.700
valid measures of the constructs in 
your theory is a challenging task.

00:00:05.700 --> 00:00:10.470
In this video I will look at the 
link between theory and data and  

00:00:10.470 --> 00:00:13.140
how that link is built in empirical papers.

00:00:13.140 --> 00:00:20.070
The idea of the link between theory and 
data is that data is something that we  

00:00:20.070 --> 00:00:24.960
observe and then quite far from the data 
is the theoretical concept. And we have  

00:00:24.960 --> 00:00:31.530
to somehow argue that the data are related 
to the theory. If the data are unrelated  

00:00:31.530 --> 00:00:37.170
the theory then we cannot claim that the 
data would allow us to test the theory.

00:00:37.170 --> 00:00:44.250
So what exactly is the nature of this link is 
something that your study needs to address. One  

00:00:44.250 --> 00:00:49.920
way to think about this issue is to introduce 
the concept of an empirical concept between  

00:00:49.920 --> 00:00:54.960
the theoretical concept and the actual 
measurement result which is your data.

00:00:54.960 --> 00:01:00.120
The idea of an empirical concept 
is that is a lower level concept  

00:01:00.120 --> 00:01:04.440
than your theoretical concept and 
it allows you to actually collect  

00:01:04.440 --> 00:01:08.190
some data. So let's take a look at 
how that approach works in practice.

00:01:08.190 --> 00:01:16.230
We need an example and I'm going to use this 
example that I have used in the past. In 2005  

00:01:16.230 --> 00:01:23.040
there are the largest 500 Finnish companies. 
There is a finding that the women lead companies  

00:01:23.040 --> 00:01:28.740
were four point seven percentage points more 
profitable than main lead companies and we  

00:01:28.740 --> 00:01:33.960
want to make a claim that naming a woman as 
a CEO causes the profitability to increase.

00:01:33.960 --> 00:01:39.330
So our theoretical concept here 
is the CEO gender and second  

00:01:39.330 --> 00:01:44.490
theoretical concept is profitability 
or performance. Then we have to figure  

00:01:44.490 --> 00:01:49.470
out how exactly we link those two 
theoretical concepts to the data.

00:01:49.470 --> 00:01:54.390
How it works is that we introduce the 
empirical concept and we have been using  

00:01:54.390 --> 00:02:01.710
this diagram before when we were discussing about 
inductive and deductive logic. The idea was that  

00:02:01.710 --> 00:02:07.890
we start with a theoretical proposition. 
Then from the theoretical proposition we  

00:02:07.890 --> 00:02:12.630
derive a testable hypothesis that is on 
a lower level of abstraction. Then we  

00:02:12.630 --> 00:02:19.810
collect some data and we test for statistical 
Association which allows us to make claims  

00:02:19.810 --> 00:02:23.830
about the correctness of the hypothesis 
and then correctness of the hypothesis.

00:02:23.830 --> 00:02:29.500
The idea was that we apply deductive logic 
so that if the proposition is correct then  

00:02:29.500 --> 00:02:33.310
the hypothesis should be observed 
and then we check if we actually  

00:02:33.310 --> 00:02:37.000
do observe by calculating something 
based on our measurement results.

00:02:37.000 --> 00:02:42.220
Our focus this far has been on 
the proposition hypothesis and  

00:02:42.220 --> 00:02:47.350
statistical Association and we haven't 
really discussed much about these arrows  

00:02:47.350 --> 00:02:54.760
here. So ow we're going to be looking at 
specifically what these two arrows here mean.

00:02:54.760 --> 00:03:01.540
And let's go back to our example. So the first 
concept was CEO gender and we need to have an  

00:03:01.540 --> 00:03:08.530
empirical concept that we can actually collect 
data for. or example if the gender of the CEO  

00:03:08.530 --> 00:03:15.010
is theoretical concept we could have the result 
of a medical examination as an empirical concept  

00:03:15.010 --> 00:03:19.420
that is something that we can observe data 
for but that's not a practical solution. In  

00:03:19.420 --> 00:03:25.990
practice we can just use our empirical concept 
or we can define it us whether the CEOs first  

00:03:25.990 --> 00:03:32.140
name is a man's name or a woman's name. That of 
course could have some reliability or validity  

00:03:32.140 --> 00:03:39.700
problems because we may not be able to know 
for sure that a name indicates a woman because  

00:03:39.700 --> 00:03:45.580
some names are used for both genders. Then 
we have specific names for specific CEOs.

00:03:45.580 --> 00:03:50.530
The same thing here we need to have a concept. We  

00:03:50.530 --> 00:03:55.660
have the performance that's the dependent 
variable theoretical variable. ROA is the  

00:03:55.660 --> 00:04:00.490
empirical concept here in the example and 
then we have ROA data for specific firms.

00:04:00.490 --> 00:04:06.010
Now the question is how do we justify 
these relationships? How do we justify  

00:04:06.010 --> 00:04:11.890
that whether the CEO's name is a 
man's name is a reliable and valid  

00:04:11.890 --> 00:04:17.920
measure of the theoretical concept? 
How do we justify here that ROA is  

00:04:17.920 --> 00:04:23.590
a valid performance measure and how do 
we justify that our data is reliable?

00:04:23.590 --> 00:04:32.080
Let's take a look at ROA. So why would ROA 
be a valid and reliable measure. We have  

00:04:32.080 --> 00:04:37.270
to first understand what is reliability and 
what is valid it here. Reliability here in  

00:04:37.270 --> 00:04:45.070
this figure is here betweenreturn on assets 
the conceptual definition of the empirical  

00:04:45.070 --> 00:04:50.410
concept and the actual data. So do we get the 
same data again if we collect the same data  

00:04:50.410 --> 00:04:55.270
for the same sample. With ROA, because it's an 
accounting figure that comes from a database,  

00:04:55.270 --> 00:05:01.060
we concluded it is probably highly 
reliable. So reliability is here  

00:05:01.060 --> 00:05:06.070
and then validity on the other hand 
is a much more challenging question.

00:05:06.070 --> 00:05:11.740
Can we claim that return on assets is 
actually a valid measure of performance  

00:05:11.740 --> 00:05:21.520
and how do we do that? Reliability is fairly 
simple to argue. So the simplest way would  

00:05:21.520 --> 00:05:25.810
be just to measure the same thing again. 
Demonstrate that you get the same result  

00:05:25.810 --> 00:05:31.780
then it's reliable. So reliability is not about 
whether the variable actually measures what it  

00:05:31.780 --> 00:05:35.770
is supposed to measure. It's simply that 
if we do the study again would we get the  

00:05:35.770 --> 00:05:39.670
same result. Doing the study again doing the 
measurement again is a simple way of doing it.

00:05:39.670 --> 00:05:46.180
Validity on the other hand - we have to argue 
that return on assets is a valid performance  

00:05:46.180 --> 00:05:51.100
measure. So how exactly do we do that? there 
are a couple of different strategies but this  

00:05:51.100 --> 00:05:56.140
is non-statistical argument so it's an argument 
based on theory and based in our understanding  

00:05:56.140 --> 00:06:02.560
of the phenomenon. For example we could argue 
that ROA return on assets is a valid measure  

00:06:02.560 --> 00:06:09.220
of performance because that is a performance 
measure that investors and managers care about.

00:06:09.220 --> 00:06:15.190
So if it's a relevant measure for 
investors and managers who we hope  

00:06:15.190 --> 00:06:20.860
to inform with our study then it's a 
valid measure. That's one way. Another  

00:06:20.860 --> 00:06:25.300
way of thinking about is that the purpose 
of the company is tho general profits and  

00:06:25.300 --> 00:06:28.210
earn money for the owner so that's the 
purpose of a business organization.

00:06:28.210 --> 00:06:37.060
And then our return on assets is a function 
of that money generated divided by the money  

00:06:37.060 --> 00:06:43.840
invested in terms of assets. So it's kind 
of like a way of standardizing taking into  

00:06:43.840 --> 00:06:49.630
account that companies of different size 
produce different amount of results. So  

00:06:49.630 --> 00:06:54.820
it's scales the ultimate output which is the 
profits based on the company size. So that  

00:06:54.820 --> 00:07:01.090
would be an argument for ROA as well. But 
this is not a statistical argument. It's an  

00:07:01.090 --> 00:07:08.920
argument that this is a relevant metric 
and it's based on either that we have a  

00:07:08.920 --> 00:07:15.160
theoretical understanding what is the purpose of 
the organization then we say that this reflects  

00:07:15.160 --> 00:07:21.970
a purpose or it could be made by arguing that 
that's a relevant variable for practitioners.

00:07:21.970 --> 00:07:28.720
Either way it's a substantive instead 
of methodological argument. So this is  

00:07:28.720 --> 00:07:34.420
a statistical problem reliability and this is 
the theoretical and a philosophical problem.  

00:07:34.420 --> 00:07:39.790
So it relates to really is this irrelevant for 
the readers of your audience and your theory.

00:07:39.790 --> 00:07:47.230
So most researchers when we do research 
we apply the empirical concept as a proxy  

00:07:47.230 --> 00:07:53.620
and in practice that means that we simply 
assume that the empirical concept is equal  

00:07:53.620 --> 00:07:57.910
to the theoretical concept. So once we have 
argued that this empirical concept has some  

00:07:57.910 --> 00:08:04.420
relevance for the theory then we use it as 
a substitute or a proxy for the theoretical  

00:08:04.420 --> 00:08:09.880
concept. The reason for that is that we really 
cannot measure a theoretical concept directly  

00:08:09.880 --> 00:08:14.290
so using this empirical concept as a proxy 
it's the best thing that we can actually do.

00:08:14.290 --> 00:08:23.590
Let's take a look at how Deephouse paper does this 
kind of all thinking. So they had a proposition  

00:08:23.590 --> 00:08:30.580
about statistical similarity and performance. Then 
they are using relative ROA as their performance  

00:08:30.580 --> 00:08:39.160
measure the empirical concept and stability 
deviation as empirical concept measuring  

00:08:39.160 --> 00:08:45.640
strategy similarity and then they had some data 
that they used for to calculate this result.

00:08:45.640 --> 00:08:54.250
How do we argue that strategic deviation is a 
valid measure of strategy similarity? Simply  

00:08:54.250 --> 00:08:59.470
the fact it's labeled similarly to strategic 
similarity doesn't really mean anything.

00:08:59.470 --> 00:09:05.470
The fact that we decide to label something 
doesn't give it a meaning. So that is  

00:09:05.470 --> 00:09:11.230
called the nominalist fallacy. If we claim 
that just because we decided to name this  

00:09:11.230 --> 00:09:15.310
strategic similarity it must be a measure 
of the similarity is not a valid argument.

00:09:15.310 --> 00:09:23.530
So how do we justify in. We talked about ROA in 
in the last slide so that's simple. Strategic  

00:09:23.530 --> 00:09:31.870
similarity their argument is basically that 
which asset categories behold is one of the most  

00:09:31.870 --> 00:09:39.700
important strategic decisions of commercial banks. 
So that's the argument for why they take these  

00:09:39.700 --> 00:09:45.280
different asset categories into consideration. 
Then they claim that previous research has  

00:09:45.280 --> 00:09:52.270
summarized these different asset categories 
that they use for calculating deviation in  

00:09:52.270 --> 00:09:57.340
a certain way and they use the same approach 
and they use other study for justification.

00:09:57.340 --> 00:10:03.670
So the way you argue for validity there are 
a couple of different ways. You have to first  

00:10:03.670 --> 00:10:11.200
explain the relevance of the variables or the data 
for your theory. In this case asset categories  

00:10:11.200 --> 00:10:16.210
are relevant for banks and then the actual 
measurement approach you either have to justified  

00:10:16.210 --> 00:10:22.840
yourself or you can say that others have used this 
approach and others have provided justification.

00:10:22.840 --> 00:10:28.060
If you do that you must be careful that you 
actually check that the paper that your site  

00:10:28.060 --> 00:10:33.250
provides a justification because sometimes 
researchers use completely unjustified  

00:10:33.250 --> 00:10:39.430
measures and just the fact that something has been 
published with the measurement approach doesn't  

00:10:39.430 --> 00:10:43.570
make that measurement approach necessarily valid. 
So you have to look at the actual validity of  

00:10:43.570 --> 00:10:49.930
claims and validity evidence in published studies 
when you decide which measurement approach to use.