WEBVTT

00:00:00.060 --> 00:00:04.200
Randomized experiment is the golden 
standard of empirical research. 

00:00:04.200 --> 00:00:09.330
However there are ways that 
randomized experiments can go wrong. 

00:00:09.330 --> 00:00:15.510
So just the fact that you randomized your study 
sample in the treatment and control groups. 

00:00:15.510 --> 00:00:21.480
And put those two groups into different 
procedures and then measure a difference  

00:00:21.480 --> 00:00:25.680
in the outcome variable of interest,
does not necessarily imply that  

00:00:25.680 --> 00:00:29.940
you have a valid cause of claim.
In this video I will explain a couple  

00:00:29.940 --> 00:00:35.520
of problems that experiments may face.
When we talk about the experiments we  

00:00:35.520 --> 00:00:39.330
need to remember that there are two 
important properties of an experiment. 

00:00:39.330 --> 00:00:43.920
Two things that make an experiment.
First we have the randomization here and then  

00:00:43.920 --> 00:00:51.300
we have the treatment group and a control group.
And this provides valid causal evidence if the  

00:00:51.300 --> 00:00:53.910
randomization works and our 
sample size is large enough. 

00:00:53.910 --> 00:01:02.010
And there are no problems with the procedures.
When we talk about validity of conclusions  

00:01:02.010 --> 00:01:06.060
from experiments we need to consider 
internal validity and external validity. 

00:01:06.060 --> 00:01:14.610
External validity is basically whether are 
the results from our sample or our population  

00:01:14.610 --> 00:01:19.560
generalizes to other populations of interest.
The typical problem in external validity  

00:01:19.560 --> 00:01:23.850
is the use of student samples.
So for example if we want to study  

00:01:23.850 --> 00:01:33.060
how directors make decisions in companies 
and we study that through students who do  

00:01:33.060 --> 00:01:37.980
a business simulation in a classroom the 
external validity is pretty questionable. 

00:01:37.980 --> 00:01:42.720
Student samples are not always bad 
but we have to consider the context. 

00:01:42.720 --> 00:01:47.580
For example if you want to study 
personal use of IT then at least  

00:01:47.580 --> 00:01:52.770
I wouldn't have any problems in using students.
Because students and general populations are more  

00:01:52.770 --> 00:02:01.140
similar in that than how students in classroom 
works is similar to our boardrooms for example. 

00:02:01.140 --> 00:02:04.620
But there are also issues 
related to internal validity. 

00:02:04.620 --> 00:02:10.170
Issues that make your causal claim questionable 
even for the population of interest. 

00:02:11.190 --> 00:02:14.700
For example we can we can 
study student in a class. 

00:02:14.700 --> 00:02:22.530
And only try to generalize to what those 
students would do outside the class and be not  

00:02:22.530 --> 00:02:28.140
able to generalize because of these issues.
There is a nice review of these issues in  

00:02:28.140 --> 00:02:34.290
experimental design by S. Lonad and co-authors 
from the Journal of Operations Management. 

00:02:34.290 --> 00:02:40.320
They have this summary table that lists 
certain issues that they explain in detail  

00:02:40.320 --> 00:02:47.730
in the article and they divide these issues in the 
statistical issues and internal validity threats. 

00:02:47.730 --> 00:02:53.610
I'll focus on the first set of issues because 
the statistical issues perhaps except for  

00:02:53.610 --> 00:03:01.590
excluding non-compliers is more general.
So if you don't take into consideration  

00:03:01.590 --> 00:03:08.610
non-independence observations then any inference 
with any research design is possibly invalid. 

00:03:09.240 --> 00:03:12.990
Let's take a look at these issues 
here this internal validity issues. 

00:03:12.990 --> 00:03:17.640
I'll go through an example that 
demonstrates this on the next slide. 

00:03:18.540 --> 00:03:22.770
The first on the list is unfair 
comparison and demand effects. 

00:03:22.770 --> 00:03:28.560
That's basically two different problems.
The unfair comparison can be understood  

00:03:28.560 --> 00:03:36.420
or with the poison and medication example.
If you have your treatment group resistant  

00:03:36.420 --> 00:03:41.040
the medication and your control group resists the 
poison the fact that the outcomes are different  

00:03:41.040 --> 00:03:48.360
does not mean that the medication worked.
It means that, it could also mean that the  

00:03:48.360 --> 00:03:51.810
medication didn't work but poison 
just made people feel a lot worse. 

00:03:51.810 --> 00:03:59.700
Or in an extreme scenario it might mean 
that the medication actually is harmful  

00:03:59.700 --> 00:04:05.730
but poison is more harmful for people.
The important thing here is that your  

00:04:05.730 --> 00:04:11.580
control group should be really neutral 
and not like this good and bad comparison. 

00:04:11.580 --> 00:04:18.780
Another is a treatment 
effect and so demand effect. 

00:04:18.780 --> 00:04:27.960
And demand effect relates to all the subjects 
in the experiment trying to infer what the  

00:04:27.960 --> 00:04:32.400
experimenter is trying to study.
And how the experimenter  

00:04:32.400 --> 00:04:39.180
would like them to respond.
This is something that has be studied and there's  

00:04:39.180 --> 00:04:45.600
evidence that this phenomenon actually exists.
Even if people are not consciously trying to  

00:04:45.600 --> 00:04:50.670
satisfy the demands of the experiment.
That's the first group of issues. 

00:04:50.670 --> 00:04:54.870
The second group is on 
non-consequential decision environments. 

00:04:54.870 --> 00:05:01.260
This is particularly relevant for experimental 
vignette studies where we send surveys,  

00:05:01.260 --> 00:05:07.320
two versions of surveys, that describe 
the same scenario with small variations. 

00:05:07.320 --> 00:05:14.070
And then ask people questions about that scenario.
If you just fill in a survey, where there are no  

00:05:14.070 --> 00:05:21.870
consequences for you from your actions 
and it's not clear if you would respond  

00:05:21.870 --> 00:05:26.670
the same way if there acts were consequences.
I'll show you an example on the next slide. 

00:05:26.670 --> 00:05:33.030
Then there is deception.
Deception does not  

00:05:33.030 --> 00:05:40.290
necessarily invalidate the study.
But there are two arguments against deception. 

00:05:40.290 --> 00:05:45.480
One is the ethical argument that researchers 
should not lie to their subjects. 

00:05:45.480 --> 00:05:50.340
So if you deceive intentionally mislead 
your subjects then you are being unethical. 

00:05:50.340 --> 00:05:57.900
There is some debate on whether being unethical 
this way is acceptable in some scenarios where  

00:05:57.900 --> 00:06:04.050
the results would be very important to get.
So there are some important studies in the history  

00:06:04.050 --> 00:06:10.110
that have been done using deception and some 
of those studies like the Milgram's experiment  

00:06:10.110 --> 00:06:17.220
would be considered really unethical now.
Then there's another issue about deceptions. 

00:06:17.220 --> 00:06:23.280
So if you have a lab where we invite people 
particularly, if you invite students there  

00:06:23.280 --> 00:06:29.160
and you know that the students will be on subjects 
in a couple of experiments during their studies. 

00:06:29.160 --> 00:06:35.160
If you deceive them and they find out that 
they were lied to in the first experiment,  

00:06:35.160 --> 00:06:38.700
how are they gonna take you 
seriously in your second experiment. 

00:06:38.700 --> 00:06:43.680
So the arguments are against deception is the 
ethical argument and it's also the argument  

00:06:43.680 --> 00:06:49.740
that we are kind of like spoiling 
our subject pool by lying to them. 

00:06:49.740 --> 00:06:55.890
Then our the fourth on the list is manipulation 
checks before the dependent variable. 

00:06:55.890 --> 00:07:02.250
The idea here is the manipulation check.
What it means is: 

00:07:02.250 --> 00:07:07.980
That if we for example give people 
medication and that is the kind of  

00:07:07.980 --> 00:07:13.410
medication that people take at home.
And then they come back for measurement  

00:07:13.410 --> 00:07:19.710
a week later we ask them did you actually take 
the medication because some of our subjects might  

00:07:19.710 --> 00:07:23.820
have forgotten to take the medication.
And that needs to be taken into account  

00:07:23.820 --> 00:07:27.450
in the statistical analysis.
In practice that will be a case  

00:07:27.450 --> 00:07:33.510
for using instrumental variables.
Problems arise however if we do  

00:07:33.510 --> 00:07:40.830
a manipulation check before there are 
measurement of the dependent variable. 

00:07:40.830 --> 00:07:48.510
It is then possible that the respondents, 
particularly if we a measure or do survey  

00:07:48.510 --> 00:07:52.260
based measurement or some other kind of 
measurement, where we measure people's attitude. 

00:07:52.260 --> 00:07:58.020
Then the subjects may infer, based 
on our manipulation check what we  

00:07:58.020 --> 00:08:03.060
are actually studying and then trying 
to adjust their response accordingly. 

00:08:03.960 --> 00:08:11.550
Let's take a look at an example and how 
these effects might manifest in a study. 

00:08:13.710 --> 00:08:20.730
This is a completely made up study.
This is an expert in a vignette study,  

00:08:20.730 --> 00:08:26.820
the idea is that we present two scenarios.
One individual receives one of these scenarios  

00:08:26.820 --> 00:08:30.750
in a survey but not the other.
And this is randomized 

00:08:30.750 --> 00:08:37.440
So have four informants receive scenario one 
half of our informants receive scenario 2 here. 

00:08:37.440 --> 00:08:42.900
Then we ask, based on these 
two scenarios two things. 

00:08:42.900 --> 00:08:46.260
Is the company performing ethically?
That is our manipulation check. 

00:08:46.260 --> 00:08:53.070
Would you buy the shoes?
So we have shoes that are  

00:08:53.070 --> 00:08:58.070
less expensive than major brand shoes.
You really want to have the shoes. 

00:08:58.070 --> 00:09:02.810
You hear that this company uses 
child labor and you hear that  

00:09:02.810 --> 00:09:09.140
this company is behaving very ethically.
They have a corporate social responsibility  

00:09:09.140 --> 00:09:14.090
program that they just announced.
How are these issues listed in the  

00:09:14.090 --> 00:09:19.550
Lonad article, manifested in this example.
First of all we have an unfair comparison. 

00:09:20.360 --> 00:09:25.310
We are not comparing a bad company 
against the neutral company. 

00:09:25.310 --> 00:09:31.730
But instead we are comparing very unethical 
company against a very ethical company. 

00:09:32.330 --> 00:09:36.680
We cannot say that doing 
unethical things would be bad. 

00:09:36.680 --> 00:09:45.140
Because the baseline is not doing unethical things 
but if the baseline is doing good for the society. 

00:09:45.140 --> 00:09:54.020
Also we cannot say that CSR programs will be 
helpful because the baseline is not no CSR but  

00:09:54.020 --> 00:09:57.620
it's very unethical behavior.
That's an unfair comparison. 

00:09:57.620 --> 00:10:02.450
It's a poison and medication comparison.
If there's a difference we don't  

00:10:02.450 --> 00:10:05.900
know which one causes it.
Then there's a demand effect. 

00:10:05.900 --> 00:10:12.170
So if you read this short vignette you see that 
this is just basically facts and then there is  

00:10:12.170 --> 00:10:17.180
this statement that stands out even if it wasn't 
bolded that this company's using child labor. 

00:10:17.750 --> 00:10:23.510
That is not something that you would perhaps 
know if you were to buy athletic shoes. 

00:10:23.510 --> 00:10:27.500
And then there's the other thing 
here that these companies use in CSR,  

00:10:27.500 --> 00:10:33.110
is implementing a CSR program there is also 
information that you probably wouldn't know. 

00:10:33.110 --> 00:10:37.820
Or wouldn't notice even if it was 
given to you in a broader context. 

00:10:37.820 --> 00:10:44.090
But in isolation this stands out and 
it is clear that the experiment here  

00:10:44.090 --> 00:10:48.710
is about ethics or corporate social 
responsibility or something like that. 

00:10:48.710 --> 00:10:54.350
And that guides our responses.
If we are say this kind of vignette  

00:10:54.350 --> 00:11:01.040
here then we pretty much know that the 
researcher wants us to answer no here. 

00:11:01.040 --> 00:11:09.920
We would not buy these shoes.
And same here the CSR would imply to us, that  

00:11:09.920 --> 00:11:15.890
the researcher is studying social responsibility.
Ae are supposed to say that we buy the issues even  

00:11:15.890 --> 00:11:21.050
if they are less expensive for some reason.
That's the demand effect. 

00:11:21.050 --> 00:11:27.170
This is also non consequential decision.
Why it's non-consequential is that  

00:11:27.170 --> 00:11:32.420
this is just the imaginary money.
Let's say that the brand name shoes cost a  

00:11:32.420 --> 00:11:40.160
100 euros and these cheaper shoes cost 70 euros.
If you really are short on cash  

00:11:40.160 --> 00:11:44.390
and you need new shoes.
You might think that well  

00:11:44.390 --> 00:11:51.530
this time maybe, the company will be better in 
the future, it's just this time that I buy these  

00:11:51.530 --> 00:11:57.260
shoes from this slightly unethical company.
If there's real money on the line people may  

00:11:57.260 --> 00:12:03.380
behave differently than when it's just a question 
of what would you do in this imaginary scenario. 

00:12:03.380 --> 00:12:09.290
Then the final thing in this 
example is the manipulation check. 

00:12:09.290 --> 00:12:13.190
And this clearly demonstrated the 
manipulation check question is here. 

00:12:13.190 --> 00:12:17.840
Is the company, either in scenario 
1 or scenario 2, behaving ethically. 

00:12:18.470 --> 00:12:22.910
That really gives out the 
purpose of the experiment. 

00:12:22.910 --> 00:12:32.930
If we read this manipulation check and which 
purpose of this check is to basically ensure  

00:12:32.930 --> 00:12:37.070
that we have received the manipulation.
We have noticed that one of these  

00:12:37.070 --> 00:12:42.500
is more ethical than the other one.
This underlines that this is a study about ethics. 

00:12:42.500 --> 00:12:50.840
Then people will respond accordingly saying yes 
to the ethical case, no to the unethical case. 

00:12:50.840 --> 00:12:54.710
Because that is what they think 
that the experimenter wants to see.