WEBVTT WEBVTT Kind: captions Language: en 00:00:00.030 --> 00:00:02.280 Arguing that your data are reliable and   00:00:02.280 --> 00:00:05.700 valid measures of the constructs in  your theory is a challenging task. 00:00:05.700 --> 00:00:10.470 In this video I will look at the  link between theory and data and   00:00:10.470 --> 00:00:13.140 how that link is built in empirical papers. 00:00:13.140 --> 00:00:20.070 The idea of the link between theory and  data is that data is something that we   00:00:20.070 --> 00:00:24.960 observe and then quite far from the data  is the theoretical concept. And we have   00:00:24.960 --> 00:00:31.530 to somehow argue that the data are related  to the theory. If the data are unrelated   00:00:31.530 --> 00:00:37.170 the theory then we cannot claim that the  data would allow us to test the theory. 00:00:37.170 --> 00:00:44.250 So what exactly is the nature of this link is  something that your study needs to address. One   00:00:44.250 --> 00:00:49.920 way to think about this issue is to introduce  the concept of an empirical concept between   00:00:49.920 --> 00:00:54.960 the theoretical concept and the actual  measurement result which is your data. 00:00:54.960 --> 00:01:00.120 The idea of an empirical concept  is that is a lower level concept   00:01:00.120 --> 00:01:04.440 than your theoretical concept and  it allows you to actually collect   00:01:04.440 --> 00:01:08.190 some data. So let's take a look at  how that approach works in practice. 00:01:08.190 --> 00:01:16.230 We need an example and I'm going to use this  example that I have used in the past. In 2005   00:01:16.230 --> 00:01:23.040 there are the largest 500 Finnish companies.  There is a finding that the women lead companies   00:01:23.040 --> 00:01:28.740 were four point seven percentage points more  profitable than main lead companies and we   00:01:28.740 --> 00:01:33.960 want to make a claim that naming a woman as  a CEO causes the profitability to increase. 00:01:33.960 --> 00:01:39.330 So our theoretical concept here  is the CEO gender and second   00:01:39.330 --> 00:01:44.490 theoretical concept is profitability  or performance. Then we have to figure   00:01:44.490 --> 00:01:49.470 out how exactly we link those two  theoretical concepts to the data. 00:01:49.470 --> 00:01:54.390 How it works is that we introduce the  empirical concept and we have been using   00:01:54.390 --> 00:02:01.710 this diagram before when we were discussing about  inductive and deductive logic. The idea was that   00:02:01.710 --> 00:02:07.890 we start with a theoretical proposition.  Then from the theoretical proposition we   00:02:07.890 --> 00:02:12.630 derive a testable hypothesis that is on  a lower level of abstraction. Then we   00:02:12.630 --> 00:02:19.810 collect some data and we test for statistical  Association which allows us to make claims   00:02:19.810 --> 00:02:23.830 about the correctness of the hypothesis  and then correctness of the hypothesis. 00:02:23.830 --> 00:02:29.500 The idea was that we apply deductive logic  so that if the proposition is correct then   00:02:29.500 --> 00:02:33.310 the hypothesis should be observed  and then we check if we actually   00:02:33.310 --> 00:02:37.000 do observe by calculating something  based on our measurement results. 00:02:37.000 --> 00:02:42.220 Our focus this far has been on  the proposition hypothesis and   00:02:42.220 --> 00:02:47.350 statistical Association and we haven't  really discussed much about these arrows   00:02:47.350 --> 00:02:54.760 here. So ow we're going to be looking at  specifically what these two arrows here mean. 00:02:54.760 --> 00:03:01.540 And let's go back to our example. So the first  concept was CEO gender and we need to have an   00:03:01.540 --> 00:03:08.530 empirical concept that we can actually collect  data for. or example if the gender of the CEO   00:03:08.530 --> 00:03:15.010 is theoretical concept we could have the result  of a medical examination as an empirical concept   00:03:15.010 --> 00:03:19.420 that is something that we can observe data  for but that's not a practical solution. In   00:03:19.420 --> 00:03:25.990 practice we can just use our empirical concept  or we can define it us whether the CEOs first   00:03:25.990 --> 00:03:32.140 name is a man's name or a woman's name. That of  course could have some reliability or validity   00:03:32.140 --> 00:03:39.700 problems because we may not be able to know  for sure that a name indicates a woman because   00:03:39.700 --> 00:03:45.580 some names are used for both genders. Then  we have specific names for specific CEOs. 00:03:45.580 --> 00:03:50.530 The same thing here we need to have a concept. We   00:03:50.530 --> 00:03:55.660 have the performance that's the dependent  variable theoretical variable. ROA is the   00:03:55.660 --> 00:04:00.490 empirical concept here in the example and  then we have ROA data for specific firms. 00:04:00.490 --> 00:04:06.010 Now the question is how do we justify  these relationships? How do we justify   00:04:06.010 --> 00:04:11.890 that whether the CEO's name is a  man's name is a reliable and valid   00:04:11.890 --> 00:04:17.920 measure of the theoretical concept?  How do we justify here that ROA is   00:04:17.920 --> 00:04:23.590 a valid performance measure and how do  we justify that our data is reliable? 00:04:23.590 --> 00:04:32.080 Let's take a look at ROA. So why would ROA  be a valid and reliable measure. We have   00:04:32.080 --> 00:04:37.270 to first understand what is reliability and  what is valid it here. Reliability here in   00:04:37.270 --> 00:04:45.070 this figure is here betweenreturn on assets  the conceptual definition of the empirical   00:04:45.070 --> 00:04:50.410 concept and the actual data. So do we get the  same data again if we collect the same data   00:04:50.410 --> 00:04:55.270 for the same sample. With ROA, because it's an  accounting figure that comes from a database,   00:04:55.270 --> 00:05:01.060 we concluded it is probably highly  reliable. So reliability is here   00:05:01.060 --> 00:05:06.070 and then validity on the other hand  is a much more challenging question. 00:05:06.070 --> 00:05:11.740 Can we claim that return on assets is  actually a valid measure of performance   00:05:11.740 --> 00:05:21.520 and how do we do that? Reliability is fairly  simple to argue. So the simplest way would   00:05:21.520 --> 00:05:25.810 be just to measure the same thing again.  Demonstrate that you get the same result   00:05:25.810 --> 00:05:31.780 then it's reliable. So reliability is not about  whether the variable actually measures what it   00:05:31.780 --> 00:05:35.770 is supposed to measure. It's simply that  if we do the study again would we get the   00:05:35.770 --> 00:05:39.670 same result. Doing the study again doing the  measurement again is a simple way of doing it. 00:05:39.670 --> 00:05:46.180 Validity on the other hand - we have to argue  that return on assets is a valid performance   00:05:46.180 --> 00:05:51.100 measure. So how exactly do we do that? there  are a couple of different strategies but this   00:05:51.100 --> 00:05:56.140 is non-statistical argument so it's an argument  based on theory and based in our understanding   00:05:56.140 --> 00:06:02.560 of the phenomenon. For example we could argue  that ROA return on assets is a valid measure   00:06:02.560 --> 00:06:09.220 of performance because that is a performance  measure that investors and managers care about. 00:06:09.220 --> 00:06:15.190 So if it's a relevant measure for  investors and managers who we hope   00:06:15.190 --> 00:06:20.860 to inform with our study then it's a  valid measure. That's one way. Another   00:06:20.860 --> 00:06:25.300 way of thinking about is that the purpose  of the company is tho general profits and   00:06:25.300 --> 00:06:28.210 earn money for the owner so that's the  purpose of a business organization. 00:06:28.210 --> 00:06:37.060 And then our return on assets is a function  of that money generated divided by the money   00:06:37.060 --> 00:06:43.840 invested in terms of assets. So it's kind  of like a way of standardizing taking into   00:06:43.840 --> 00:06:49.630 account that companies of different size  produce different amount of results. So   00:06:49.630 --> 00:06:54.820 it's scales the ultimate output which is the  profits based on the company size. So that   00:06:54.820 --> 00:07:01.090 would be an argument for ROA as well. But  this is not a statistical argument. It's an   00:07:01.090 --> 00:07:08.920 argument that this is a relevant metric  and it's based on either that we have a   00:07:08.920 --> 00:07:15.160 theoretical understanding what is the purpose of  the organization then we say that this reflects   00:07:15.160 --> 00:07:21.970 a purpose or it could be made by arguing that  that's a relevant variable for practitioners. 00:07:21.970 --> 00:07:28.720 Either way it's a substantive instead  of methodological argument. So this is   00:07:28.720 --> 00:07:34.420 a statistical problem reliability and this is  the theoretical and a philosophical problem.   00:07:34.420 --> 00:07:39.790 So it relates to really is this irrelevant for  the readers of your audience and your theory. 00:07:39.790 --> 00:07:47.230 So most researchers when we do research  we apply the empirical concept as a proxy   00:07:47.230 --> 00:07:53.620 and in practice that means that we simply  assume that the empirical concept is equal   00:07:53.620 --> 00:07:57.910 to the theoretical concept. So once we have  argued that this empirical concept has some   00:07:57.910 --> 00:08:04.420 relevance for the theory then we use it as  a substitute or a proxy for the theoretical   00:08:04.420 --> 00:08:09.880 concept. The reason for that is that we really  cannot measure a theoretical concept directly   00:08:09.880 --> 00:08:14.290 so using this empirical concept as a proxy  it's the best thing that we can actually do. 00:08:14.290 --> 00:08:23.590 Let's take a look at how Deephouse paper does this  kind of all thinking. So they had a proposition   00:08:23.590 --> 00:08:30.580 about statistical similarity and performance. Then  they are using relative ROA as their performance   00:08:30.580 --> 00:08:39.160 measure the empirical concept and stability  deviation as empirical concept measuring   00:08:39.160 --> 00:08:45.640 strategy similarity and then they had some data  that they used for to calculate this result. 00:08:45.640 --> 00:08:54.250 How do we argue that strategic deviation is a  valid measure of strategy similarity? Simply   00:08:54.250 --> 00:08:59.470 the fact it's labeled similarly to strategic  similarity doesn't really mean anything. 00:08:59.470 --> 00:09:05.470 The fact that we decide to label something  doesn't give it a meaning. So that is   00:09:05.470 --> 00:09:11.230 called the nominalist fallacy. If we claim  that just because we decided to name this   00:09:11.230 --> 00:09:15.310 strategic similarity it must be a measure  of the similarity is not a valid argument. 00:09:15.310 --> 00:09:23.530 So how do we justify in. We talked about ROA in  in the last slide so that's simple. Strategic   00:09:23.530 --> 00:09:31.870 similarity their argument is basically that  which asset categories behold is one of the most   00:09:31.870 --> 00:09:39.700 important strategic decisions of commercial banks.  So that's the argument for why they take these   00:09:39.700 --> 00:09:45.280 different asset categories into consideration.  Then they claim that previous research has   00:09:45.280 --> 00:09:52.270 summarized these different asset categories  that they use for calculating deviation in   00:09:52.270 --> 00:09:57.340 a certain way and they use the same approach  and they use other study for justification. 00:09:57.340 --> 00:10:03.670 So the way you argue for validity there are  a couple of different ways. You have to first   00:10:03.670 --> 00:10:11.200 explain the relevance of the variables or the data  for your theory. In this case asset categories   00:10:11.200 --> 00:10:16.210 are relevant for banks and then the actual  measurement approach you either have to justified   00:10:16.210 --> 00:10:22.840 yourself or you can say that others have used this  approach and others have provided justification. 00:10:22.840 --> 00:10:28.060 If you do that you must be careful that you  actually check that the paper that your site   00:10:28.060 --> 00:10:33.250 provides a justification because sometimes  researchers use completely unjustified   00:10:33.250 --> 00:10:39.430 measures and just the fact that something has been  published with the measurement approach doesn't   00:10:39.430 --> 00:10:43.570 make that measurement approach necessarily valid.  So you have to look at the actual validity of   00:10:43.570 --> 00:10:49.930 claims and validity evidence in published studies  when you decide which measurement approach to use.