WEBVTT 00:00:00.060 --> 00:00:04.200 Randomized experiment is the golden standard of empirical research. 00:00:04.200 --> 00:00:09.330 However there are ways that randomized experiments can go wrong. 00:00:09.330 --> 00:00:15.510 So just the fact that you randomized your study sample in the treatment and control groups. 00:00:15.510 --> 00:00:21.480 And put those two groups into different procedures and then measure a difference 00:00:21.480 --> 00:00:25.680 in the outcome variable of interest, does not necessarily imply that 00:00:25.680 --> 00:00:29.940 you have a valid cause of claim. In this video I will explain a couple 00:00:29.940 --> 00:00:35.520 of problems that experiments may face. When we talk about the experiments we 00:00:35.520 --> 00:00:39.330 need to remember that there are two important properties of an experiment. 00:00:39.330 --> 00:00:43.920 Two things that make an experiment. First we have the randomization here and then 00:00:43.920 --> 00:00:51.300 we have the treatment group and a control group. And this provides valid causal evidence if the 00:00:51.300 --> 00:00:53.910 randomization works and our sample size is large enough. 00:00:53.910 --> 00:01:02.010 And there are no problems with the procedures. When we talk about validity of conclusions 00:01:02.010 --> 00:01:06.060 from experiments we need to consider internal validity and external validity. 00:01:06.060 --> 00:01:14.610 External validity is basically whether are the results from our sample or our population 00:01:14.610 --> 00:01:19.560 generalizes to other populations of interest. The typical problem in external validity 00:01:19.560 --> 00:01:23.850 is the use of student samples. So for example if we want to study 00:01:23.850 --> 00:01:33.060 how directors make decisions in companies and we study that through students who do 00:01:33.060 --> 00:01:37.980 a business simulation in a classroom the external validity is pretty questionable. 00:01:37.980 --> 00:01:42.720 Student samples are not always bad but we have to consider the context. 00:01:42.720 --> 00:01:47.580 For example if you want to study personal use of IT then at least 00:01:47.580 --> 00:01:52.770 I wouldn't have any problems in using students. Because students and general populations are more 00:01:52.770 --> 00:02:01.140 similar in that than how students in classroom works is similar to our boardrooms for example. 00:02:01.140 --> 00:02:04.620 But there are also issues related to internal validity. 00:02:04.620 --> 00:02:10.170 Issues that make your causal claim questionable even for the population of interest. 00:02:11.190 --> 00:02:14.700 For example we can we can study student in a class. 00:02:14.700 --> 00:02:22.530 And only try to generalize to what those students would do outside the class and be not 00:02:22.530 --> 00:02:28.140 able to generalize because of these issues. There is a nice review of these issues in 00:02:28.140 --> 00:02:34.290 experimental design by S. Lonad and co-authors from the Journal of Operations Management. 00:02:34.290 --> 00:02:40.320 They have this summary table that lists certain issues that they explain in detail 00:02:40.320 --> 00:02:47.730 in the article and they divide these issues in the statistical issues and internal validity threats. 00:02:47.730 --> 00:02:53.610 I'll focus on the first set of issues because the statistical issues perhaps except for 00:02:53.610 --> 00:03:01.590 excluding non-compliers is more general. So if you don't take into consideration 00:03:01.590 --> 00:03:08.610 non-independence observations then any inference with any research design is possibly invalid. 00:03:09.240 --> 00:03:12.990 Let's take a look at these issues here this internal validity issues. 00:03:12.990 --> 00:03:17.640 I'll go through an example that demonstrates this on the next slide. 00:03:18.540 --> 00:03:22.770 The first on the list is unfair comparison and demand effects. 00:03:22.770 --> 00:03:28.560 That's basically two different problems. The unfair comparison can be understood 00:03:28.560 --> 00:03:36.420 or with the poison and medication example. If you have your treatment group resistant 00:03:36.420 --> 00:03:41.040 the medication and your control group resists the poison the fact that the outcomes are different 00:03:41.040 --> 00:03:48.360 does not mean that the medication worked. It means that, it could also mean that the 00:03:48.360 --> 00:03:51.810 medication didn't work but poison just made people feel a lot worse. 00:03:51.810 --> 00:03:59.700 Or in an extreme scenario it might mean that the medication actually is harmful 00:03:59.700 --> 00:04:05.730 but poison is more harmful for people. The important thing here is that your 00:04:05.730 --> 00:04:11.580 control group should be really neutral and not like this good and bad comparison. 00:04:11.580 --> 00:04:18.780 Another is a treatment effect and so demand effect. 00:04:18.780 --> 00:04:27.960 And demand effect relates to all the subjects in the experiment trying to infer what the 00:04:27.960 --> 00:04:32.400 experimenter is trying to study. And how the experimenter 00:04:32.400 --> 00:04:39.180 would like them to respond. This is something that has be studied and there's 00:04:39.180 --> 00:04:45.600 evidence that this phenomenon actually exists. Even if people are not consciously trying to 00:04:45.600 --> 00:04:50.670 satisfy the demands of the experiment. That's the first group of issues. 00:04:50.670 --> 00:04:54.870 The second group is on non-consequential decision environments. 00:04:54.870 --> 00:05:01.260 This is particularly relevant for experimental vignette studies where we send surveys, 00:05:01.260 --> 00:05:07.320 two versions of surveys, that describe the same scenario with small variations. 00:05:07.320 --> 00:05:14.070 And then ask people questions about that scenario. If you just fill in a survey, where there are no 00:05:14.070 --> 00:05:21.870 consequences for you from your actions and it's not clear if you would respond 00:05:21.870 --> 00:05:26.670 the same way if there acts were consequences. I'll show you an example on the next slide. 00:05:26.670 --> 00:05:33.030 Then there is deception. Deception does not 00:05:33.030 --> 00:05:40.290 necessarily invalidate the study. But there are two arguments against deception. 00:05:40.290 --> 00:05:45.480 One is the ethical argument that researchers should not lie to their subjects. 00:05:45.480 --> 00:05:50.340 So if you deceive intentionally mislead your subjects then you are being unethical. 00:05:50.340 --> 00:05:57.900 There is some debate on whether being unethical this way is acceptable in some scenarios where 00:05:57.900 --> 00:06:04.050 the results would be very important to get. So there are some important studies in the history 00:06:04.050 --> 00:06:10.110 that have been done using deception and some of those studies like the Milgram's experiment 00:06:10.110 --> 00:06:17.220 would be considered really unethical now. Then there's another issue about deceptions. 00:06:17.220 --> 00:06:23.280 So if you have a lab where we invite people particularly, if you invite students there 00:06:23.280 --> 00:06:29.160 and you know that the students will be on subjects in a couple of experiments during their studies. 00:06:29.160 --> 00:06:35.160 If you deceive them and they find out that they were lied to in the first experiment, 00:06:35.160 --> 00:06:38.700 how are they gonna take you seriously in your second experiment. 00:06:38.700 --> 00:06:43.680 So the arguments are against deception is the ethical argument and it's also the argument 00:06:43.680 --> 00:06:49.740 that we are kind of like spoiling our subject pool by lying to them. 00:06:49.740 --> 00:06:55.890 Then our the fourth on the list is manipulation checks before the dependent variable. 00:06:55.890 --> 00:07:02.250 The idea here is the manipulation check. What it means is: 00:07:02.250 --> 00:07:07.980 That if we for example give people medication and that is the kind of 00:07:07.980 --> 00:07:13.410 medication that people take at home. And then they come back for measurement 00:07:13.410 --> 00:07:19.710 a week later we ask them did you actually take the medication because some of our subjects might 00:07:19.710 --> 00:07:23.820 have forgotten to take the medication. And that needs to be taken into account 00:07:23.820 --> 00:07:27.450 in the statistical analysis. In practice that will be a case 00:07:27.450 --> 00:07:33.510 for using instrumental variables. Problems arise however if we do 00:07:33.510 --> 00:07:40.830 a manipulation check before there are measurement of the dependent variable. 00:07:40.830 --> 00:07:48.510 It is then possible that the respondents, particularly if we a measure or do survey 00:07:48.510 --> 00:07:52.260 based measurement or some other kind of measurement, where we measure people's attitude. 00:07:52.260 --> 00:07:58.020 Then the subjects may infer, based on our manipulation check what we 00:07:58.020 --> 00:08:03.060 are actually studying and then trying to adjust their response accordingly. 00:08:03.960 --> 00:08:11.550 Let's take a look at an example and how these effects might manifest in a study. 00:08:13.710 --> 00:08:20.730 This is a completely made up study. This is an expert in a vignette study, 00:08:20.730 --> 00:08:26.820 the idea is that we present two scenarios. One individual receives one of these scenarios 00:08:26.820 --> 00:08:30.750 in a survey but not the other. And this is randomized 00:08:30.750 --> 00:08:37.440 So have four informants receive scenario one half of our informants receive scenario 2 here. 00:08:37.440 --> 00:08:42.900 Then we ask, based on these two scenarios two things. 00:08:42.900 --> 00:08:46.260 Is the company performing ethically? That is our manipulation check. 00:08:46.260 --> 00:08:53.070 Would you buy the shoes? So we have shoes that are 00:08:53.070 --> 00:08:58.070 less expensive than major brand shoes. You really want to have the shoes. 00:08:58.070 --> 00:09:02.810 You hear that this company uses child labor and you hear that 00:09:02.810 --> 00:09:09.140 this company is behaving very ethically. They have a corporate social responsibility 00:09:09.140 --> 00:09:14.090 program that they just announced. How are these issues listed in the 00:09:14.090 --> 00:09:19.550 Lonad article, manifested in this example. First of all we have an unfair comparison. 00:09:20.360 --> 00:09:25.310 We are not comparing a bad company against the neutral company. 00:09:25.310 --> 00:09:31.730 But instead we are comparing very unethical company against a very ethical company. 00:09:32.330 --> 00:09:36.680 We cannot say that doing unethical things would be bad. 00:09:36.680 --> 00:09:45.140 Because the baseline is not doing unethical things but if the baseline is doing good for the society. 00:09:45.140 --> 00:09:54.020 Also we cannot say that CSR programs will be helpful because the baseline is not no CSR but 00:09:54.020 --> 00:09:57.620 it's very unethical behavior. That's an unfair comparison. 00:09:57.620 --> 00:10:02.450 It's a poison and medication comparison. If there's a difference we don't 00:10:02.450 --> 00:10:05.900 know which one causes it. Then there's a demand effect. 00:10:05.900 --> 00:10:12.170 So if you read this short vignette you see that this is just basically facts and then there is 00:10:12.170 --> 00:10:17.180 this statement that stands out even if it wasn't bolded that this company's using child labor. 00:10:17.750 --> 00:10:23.510 That is not something that you would perhaps know if you were to buy athletic shoes. 00:10:23.510 --> 00:10:27.500 And then there's the other thing here that these companies use in CSR, 00:10:27.500 --> 00:10:33.110 is implementing a CSR program there is also information that you probably wouldn't know. 00:10:33.110 --> 00:10:37.820 Or wouldn't notice even if it was given to you in a broader context. 00:10:37.820 --> 00:10:44.090 But in isolation this stands out and it is clear that the experiment here 00:10:44.090 --> 00:10:48.710 is about ethics or corporate social responsibility or something like that. 00:10:48.710 --> 00:10:54.350 And that guides our responses. If we are say this kind of vignette 00:10:54.350 --> 00:11:01.040 here then we pretty much know that the researcher wants us to answer no here. 00:11:01.040 --> 00:11:09.920 We would not buy these shoes. And same here the CSR would imply to us, that 00:11:09.920 --> 00:11:15.890 the researcher is studying social responsibility. Ae are supposed to say that we buy the issues even 00:11:15.890 --> 00:11:21.050 if they are less expensive for some reason. That's the demand effect. 00:11:21.050 --> 00:11:27.170 This is also non consequential decision. Why it's non-consequential is that 00:11:27.170 --> 00:11:32.420 this is just the imaginary money. Let's say that the brand name shoes cost a 00:11:32.420 --> 00:11:40.160 100 euros and these cheaper shoes cost 70 euros. If you really are short on cash 00:11:40.160 --> 00:11:44.390 and you need new shoes. You might think that well 00:11:44.390 --> 00:11:51.530 this time maybe, the company will be better in the future, it's just this time that I buy these 00:11:51.530 --> 00:11:57.260 shoes from this slightly unethical company. If there's real money on the line people may 00:11:57.260 --> 00:12:03.380 behave differently than when it's just a question of what would you do in this imaginary scenario. 00:12:03.380 --> 00:12:09.290 Then the final thing in this example is the manipulation check. 00:12:09.290 --> 00:12:13.190 And this clearly demonstrated the manipulation check question is here. 00:12:13.190 --> 00:12:17.840 Is the company, either in scenario 1 or scenario 2, behaving ethically. 00:12:18.470 --> 00:12:22.910 That really gives out the purpose of the experiment. 00:12:22.910 --> 00:12:32.930 If we read this manipulation check and which purpose of this check is to basically ensure 00:12:32.930 --> 00:12:37.070 that we have received the manipulation. We have noticed that one of these 00:12:37.070 --> 00:12:42.500 is more ethical than the other one. This underlines that this is a study about ethics. 00:12:42.500 --> 00:12:50.840 Then people will respond accordingly saying yes to the ethical case, no to the unethical case. 00:12:50.840 --> 00:12:54.710 Because that is what they think that the experimenter wants to see.