WEBVTT 00:00:00.060 --> 00:00:04.200 Randomized experiment is the golden  standard of empirical research.  00:00:04.200 --> 00:00:09.330 However there are ways that  randomized experiments can go wrong.  00:00:09.330 --> 00:00:15.510 So just the fact that you randomized your study  sample in the treatment and control groups.  00:00:15.510 --> 00:00:21.480 And put those two groups into different  procedures and then measure a difference   00:00:21.480 --> 00:00:25.680 in the outcome variable of interest, does not necessarily imply that   00:00:25.680 --> 00:00:29.940 you have a valid cause of claim. In this video I will explain a couple   00:00:29.940 --> 00:00:35.520 of problems that experiments may face. When we talk about the experiments we   00:00:35.520 --> 00:00:39.330 need to remember that there are two  important properties of an experiment.  00:00:39.330 --> 00:00:43.920 Two things that make an experiment. First we have the randomization here and then   00:00:43.920 --> 00:00:51.300 we have the treatment group and a control group. And this provides valid causal evidence if the   00:00:51.300 --> 00:00:53.910 randomization works and our  sample size is large enough.  00:00:53.910 --> 00:01:02.010 And there are no problems with the procedures. When we talk about validity of conclusions   00:01:02.010 --> 00:01:06.060 from experiments we need to consider  internal validity and external validity.  00:01:06.060 --> 00:01:14.610 External validity is basically whether are  the results from our sample or our population   00:01:14.610 --> 00:01:19.560 generalizes to other populations of interest. The typical problem in external validity   00:01:19.560 --> 00:01:23.850 is the use of student samples. So for example if we want to study   00:01:23.850 --> 00:01:33.060 how directors make decisions in companies  and we study that through students who do   00:01:33.060 --> 00:01:37.980 a business simulation in a classroom the  external validity is pretty questionable.  00:01:37.980 --> 00:01:42.720 Student samples are not always bad  but we have to consider the context.  00:01:42.720 --> 00:01:47.580 For example if you want to study  personal use of IT then at least   00:01:47.580 --> 00:01:52.770 I wouldn't have any problems in using students. Because students and general populations are more   00:01:52.770 --> 00:02:01.140 similar in that than how students in classroom  works is similar to our boardrooms for example.  00:02:01.140 --> 00:02:04.620 But there are also issues  related to internal validity.  00:02:04.620 --> 00:02:10.170 Issues that make your causal claim questionable  even for the population of interest.  00:02:11.190 --> 00:02:14.700 For example we can we can  study student in a class.  00:02:14.700 --> 00:02:22.530 And only try to generalize to what those  students would do outside the class and be not   00:02:22.530 --> 00:02:28.140 able to generalize because of these issues. There is a nice review of these issues in   00:02:28.140 --> 00:02:34.290 experimental design by S. Lonad and co-authors  from the Journal of Operations Management.  00:02:34.290 --> 00:02:40.320 They have this summary table that lists  certain issues that they explain in detail   00:02:40.320 --> 00:02:47.730 in the article and they divide these issues in the  statistical issues and internal validity threats.  00:02:47.730 --> 00:02:53.610 I'll focus on the first set of issues because  the statistical issues perhaps except for   00:02:53.610 --> 00:03:01.590 excluding non-compliers is more general. So if you don't take into consideration   00:03:01.590 --> 00:03:08.610 non-independence observations then any inference  with any research design is possibly invalid.  00:03:09.240 --> 00:03:12.990 Let's take a look at these issues  here this internal validity issues.  00:03:12.990 --> 00:03:17.640 I'll go through an example that  demonstrates this on the next slide.  00:03:18.540 --> 00:03:22.770 The first on the list is unfair  comparison and demand effects.  00:03:22.770 --> 00:03:28.560 That's basically two different problems. The unfair comparison can be understood   00:03:28.560 --> 00:03:36.420 or with the poison and medication example. If you have your treatment group resistant   00:03:36.420 --> 00:03:41.040 the medication and your control group resists the  poison the fact that the outcomes are different   00:03:41.040 --> 00:03:48.360 does not mean that the medication worked. It means that, it could also mean that the   00:03:48.360 --> 00:03:51.810 medication didn't work but poison  just made people feel a lot worse.  00:03:51.810 --> 00:03:59.700 Or in an extreme scenario it might mean  that the medication actually is harmful   00:03:59.700 --> 00:04:05.730 but poison is more harmful for people. The important thing here is that your   00:04:05.730 --> 00:04:11.580 control group should be really neutral  and not like this good and bad comparison.  00:04:11.580 --> 00:04:18.780 Another is a treatment  effect and so demand effect.  00:04:18.780 --> 00:04:27.960 And demand effect relates to all the subjects  in the experiment trying to infer what the   00:04:27.960 --> 00:04:32.400 experimenter is trying to study. And how the experimenter   00:04:32.400 --> 00:04:39.180 would like them to respond. This is something that has be studied and there's   00:04:39.180 --> 00:04:45.600 evidence that this phenomenon actually exists. Even if people are not consciously trying to   00:04:45.600 --> 00:04:50.670 satisfy the demands of the experiment. That's the first group of issues.  00:04:50.670 --> 00:04:54.870 The second group is on  non-consequential decision environments.  00:04:54.870 --> 00:05:01.260 This is particularly relevant for experimental  vignette studies where we send surveys,   00:05:01.260 --> 00:05:07.320 two versions of surveys, that describe  the same scenario with small variations.  00:05:07.320 --> 00:05:14.070 And then ask people questions about that scenario. If you just fill in a survey, where there are no   00:05:14.070 --> 00:05:21.870 consequences for you from your actions  and it's not clear if you would respond   00:05:21.870 --> 00:05:26.670 the same way if there acts were consequences. I'll show you an example on the next slide.  00:05:26.670 --> 00:05:33.030 Then there is deception. Deception does not   00:05:33.030 --> 00:05:40.290 necessarily invalidate the study. But there are two arguments against deception.  00:05:40.290 --> 00:05:45.480 One is the ethical argument that researchers  should not lie to their subjects.  00:05:45.480 --> 00:05:50.340 So if you deceive intentionally mislead  your subjects then you are being unethical.  00:05:50.340 --> 00:05:57.900 There is some debate on whether being unethical  this way is acceptable in some scenarios where   00:05:57.900 --> 00:06:04.050 the results would be very important to get. So there are some important studies in the history   00:06:04.050 --> 00:06:10.110 that have been done using deception and some  of those studies like the Milgram's experiment   00:06:10.110 --> 00:06:17.220 would be considered really unethical now. Then there's another issue about deceptions.  00:06:17.220 --> 00:06:23.280 So if you have a lab where we invite people  particularly, if you invite students there   00:06:23.280 --> 00:06:29.160 and you know that the students will be on subjects  in a couple of experiments during their studies.  00:06:29.160 --> 00:06:35.160 If you deceive them and they find out that  they were lied to in the first experiment,   00:06:35.160 --> 00:06:38.700 how are they gonna take you  seriously in your second experiment.  00:06:38.700 --> 00:06:43.680 So the arguments are against deception is the  ethical argument and it's also the argument   00:06:43.680 --> 00:06:49.740 that we are kind of like spoiling  our subject pool by lying to them.  00:06:49.740 --> 00:06:55.890 Then our the fourth on the list is manipulation  checks before the dependent variable.  00:06:55.890 --> 00:07:02.250 The idea here is the manipulation check. What it means is:  00:07:02.250 --> 00:07:07.980 That if we for example give people  medication and that is the kind of   00:07:07.980 --> 00:07:13.410 medication that people take at home. And then they come back for measurement   00:07:13.410 --> 00:07:19.710 a week later we ask them did you actually take  the medication because some of our subjects might   00:07:19.710 --> 00:07:23.820 have forgotten to take the medication. And that needs to be taken into account   00:07:23.820 --> 00:07:27.450 in the statistical analysis. In practice that will be a case   00:07:27.450 --> 00:07:33.510 for using instrumental variables. Problems arise however if we do   00:07:33.510 --> 00:07:40.830 a manipulation check before there are  measurement of the dependent variable.  00:07:40.830 --> 00:07:48.510 It is then possible that the respondents,  particularly if we a measure or do survey   00:07:48.510 --> 00:07:52.260 based measurement or some other kind of  measurement, where we measure people's attitude.  00:07:52.260 --> 00:07:58.020 Then the subjects may infer, based  on our manipulation check what we   00:07:58.020 --> 00:08:03.060 are actually studying and then trying  to adjust their response accordingly.  00:08:03.960 --> 00:08:11.550 Let's take a look at an example and how  these effects might manifest in a study.  00:08:13.710 --> 00:08:20.730 This is a completely made up study. This is an expert in a vignette study,   00:08:20.730 --> 00:08:26.820 the idea is that we present two scenarios. One individual receives one of these scenarios   00:08:26.820 --> 00:08:30.750 in a survey but not the other. And this is randomized  00:08:30.750 --> 00:08:37.440 So have four informants receive scenario one  half of our informants receive scenario 2 here.  00:08:37.440 --> 00:08:42.900 Then we ask, based on these  two scenarios two things.  00:08:42.900 --> 00:08:46.260 Is the company performing ethically? That is our manipulation check.  00:08:46.260 --> 00:08:53.070 Would you buy the shoes? So we have shoes that are   00:08:53.070 --> 00:08:58.070 less expensive than major brand shoes. You really want to have the shoes.  00:08:58.070 --> 00:09:02.810 You hear that this company uses  child labor and you hear that   00:09:02.810 --> 00:09:09.140 this company is behaving very ethically. They have a corporate social responsibility   00:09:09.140 --> 00:09:14.090 program that they just announced. How are these issues listed in the   00:09:14.090 --> 00:09:19.550 Lonad article, manifested in this example. First of all we have an unfair comparison.  00:09:20.360 --> 00:09:25.310 We are not comparing a bad company  against the neutral company.  00:09:25.310 --> 00:09:31.730 But instead we are comparing very unethical  company against a very ethical company.  00:09:32.330 --> 00:09:36.680 We cannot say that doing  unethical things would be bad.  00:09:36.680 --> 00:09:45.140 Because the baseline is not doing unethical things  but if the baseline is doing good for the society.  00:09:45.140 --> 00:09:54.020 Also we cannot say that CSR programs will be  helpful because the baseline is not no CSR but   00:09:54.020 --> 00:09:57.620 it's very unethical behavior. That's an unfair comparison.  00:09:57.620 --> 00:10:02.450 It's a poison and medication comparison. If there's a difference we don't   00:10:02.450 --> 00:10:05.900 know which one causes it. Then there's a demand effect.  00:10:05.900 --> 00:10:12.170 So if you read this short vignette you see that  this is just basically facts and then there is   00:10:12.170 --> 00:10:17.180 this statement that stands out even if it wasn't  bolded that this company's using child labor.  00:10:17.750 --> 00:10:23.510 That is not something that you would perhaps  know if you were to buy athletic shoes.  00:10:23.510 --> 00:10:27.500 And then there's the other thing  here that these companies use in CSR,   00:10:27.500 --> 00:10:33.110 is implementing a CSR program there is also  information that you probably wouldn't know.  00:10:33.110 --> 00:10:37.820 Or wouldn't notice even if it was  given to you in a broader context.  00:10:37.820 --> 00:10:44.090 But in isolation this stands out and  it is clear that the experiment here   00:10:44.090 --> 00:10:48.710 is about ethics or corporate social  responsibility or something like that.  00:10:48.710 --> 00:10:54.350 And that guides our responses. If we are say this kind of vignette   00:10:54.350 --> 00:11:01.040 here then we pretty much know that the  researcher wants us to answer no here.  00:11:01.040 --> 00:11:09.920 We would not buy these shoes. And same here the CSR would imply to us, that   00:11:09.920 --> 00:11:15.890 the researcher is studying social responsibility. Ae are supposed to say that we buy the issues even   00:11:15.890 --> 00:11:21.050 if they are less expensive for some reason. That's the demand effect.  00:11:21.050 --> 00:11:27.170 This is also non consequential decision. Why it's non-consequential is that   00:11:27.170 --> 00:11:32.420 this is just the imaginary money. Let's say that the brand name shoes cost a   00:11:32.420 --> 00:11:40.160 100 euros and these cheaper shoes cost 70 euros. If you really are short on cash   00:11:40.160 --> 00:11:44.390 and you need new shoes. You might think that well   00:11:44.390 --> 00:11:51.530 this time maybe, the company will be better in  the future, it's just this time that I buy these   00:11:51.530 --> 00:11:57.260 shoes from this slightly unethical company. If there's real money on the line people may   00:11:57.260 --> 00:12:03.380 behave differently than when it's just a question  of what would you do in this imaginary scenario.  00:12:03.380 --> 00:12:09.290 Then the final thing in this  example is the manipulation check.  00:12:09.290 --> 00:12:13.190 And this clearly demonstrated the  manipulation check question is here.  00:12:13.190 --> 00:12:17.840 Is the company, either in scenario  1 or scenario 2, behaving ethically.  00:12:18.470 --> 00:12:22.910 That really gives out the  purpose of the experiment.  00:12:22.910 --> 00:12:32.930 If we read this manipulation check and which  purpose of this check is to basically ensure   00:12:32.930 --> 00:12:37.070 that we have received the manipulation. We have noticed that one of these   00:12:37.070 --> 00:12:42.500 is more ethical than the other one. This underlines that this is a study about ethics.  00:12:42.500 --> 00:12:50.840 Then people will respond accordingly saying yes  to the ethical case, no to the unethical case.  00:12:50.840 --> 00:12:54.710 Because that is what they think  that the experimenter wants to see.