WEBVTT

00:00:00.060 --> 00:00:04.200
Randomized experiment is the golden&nbsp;
standard of empirical research.&nbsp;

00:00:04.200 --> 00:00:09.330
However there are ways that&nbsp;
randomized experiments can go wrong.&nbsp;

00:00:09.330 --> 00:00:15.510
So just the fact that you randomized your study&nbsp;
sample in the treatment and control groups.&nbsp;

00:00:15.510 --> 00:00:21.480
And put those two groups into different&nbsp;
procedures and then measure a difference&nbsp;&nbsp;

00:00:21.480 --> 00:00:25.680
in the outcome variable of interest,
does not necessarily imply that&nbsp;&nbsp;

00:00:25.680 --> 00:00:29.940
you have a valid cause of claim.
In this video I will explain a couple&nbsp;&nbsp;

00:00:29.940 --> 00:00:35.520
of problems that experiments may face.
When we talk about the experiments we&nbsp;&nbsp;

00:00:35.520 --> 00:00:39.330
need to remember that there are two&nbsp;
important properties of an experiment.&nbsp;

00:00:39.330 --> 00:00:43.920
Two things that make an experiment.
First we have the randomization here and then&nbsp;&nbsp;

00:00:43.920 --> 00:00:51.300
we have the treatment group and a control group.
And this provides valid causal evidence if the&nbsp;&nbsp;

00:00:51.300 --> 00:00:53.910
randomization works and our&nbsp;
sample size is large enough.&nbsp;

00:00:53.910 --> 00:01:02.010
And there are no problems with the procedures.
When we talk about validity of conclusions&nbsp;&nbsp;

00:01:02.010 --> 00:01:06.060
from experiments we need to consider&nbsp;
internal validity and external validity.&nbsp;

00:01:06.060 --> 00:01:14.610
External validity is basically whether are&nbsp;
the results from our sample or our population&nbsp;&nbsp;

00:01:14.610 --> 00:01:19.560
generalizes to other populations of interest.
The typical problem in external validity&nbsp;&nbsp;

00:01:19.560 --> 00:01:23.850
is the use of student samples.
So for example if we want to study&nbsp;&nbsp;

00:01:23.850 --> 00:01:33.060
how directors make decisions in companies&nbsp;
and we study that through students who do&nbsp;&nbsp;

00:01:33.060 --> 00:01:37.980
a business simulation in a classroom the&nbsp;
external validity is pretty questionable.&nbsp;

00:01:37.980 --> 00:01:42.720
Student samples are not always bad&nbsp;
but we have to consider the context.&nbsp;

00:01:42.720 --> 00:01:47.580
For example if you want to study&nbsp;
personal use of IT then at least&nbsp;&nbsp;

00:01:47.580 --> 00:01:52.770
I wouldn't have any problems in using students.
Because students and general populations are more&nbsp;&nbsp;

00:01:52.770 --> 00:02:01.140
similar in that than how students in classroom&nbsp;
works is similar to our boardrooms for example.&nbsp;

00:02:01.140 --> 00:02:04.620
But there are also issues&nbsp;
related to internal validity.&nbsp;

00:02:04.620 --> 00:02:10.170
Issues that make your causal claim questionable&nbsp;
even for the population of interest.&nbsp;

00:02:11.190 --> 00:02:14.700
For example we can we can&nbsp;
study student in a class.&nbsp;

00:02:14.700 --> 00:02:22.530
And only try to generalize to what those&nbsp;
students would do outside the class and be not&nbsp;&nbsp;

00:02:22.530 --> 00:02:28.140
able to generalize because of these issues.
There is a nice review of these issues in&nbsp;&nbsp;

00:02:28.140 --> 00:02:34.290
experimental design by S. Lonad and co-authors&nbsp;
from the Journal of Operations Management.&nbsp;

00:02:34.290 --> 00:02:40.320
They have this summary table that lists&nbsp;
certain issues that they explain in detail&nbsp;&nbsp;

00:02:40.320 --> 00:02:47.730
in the article and they divide these issues in the&nbsp;
statistical issues and internal validity threats.&nbsp;

00:02:47.730 --> 00:02:53.610
I'll focus on the first set of issues because&nbsp;
the statistical issues perhaps except for&nbsp;&nbsp;

00:02:53.610 --> 00:03:01.590
excluding non-compliers is more general.
So if you don't take into consideration&nbsp;&nbsp;

00:03:01.590 --> 00:03:08.610
non-independence observations then any inference&nbsp;
with any research design is possibly invalid.&nbsp;

00:03:09.240 --> 00:03:12.990
Let's take a look at these issues&nbsp;
here this internal validity issues.&nbsp;

00:03:12.990 --> 00:03:17.640
I'll go through an example that&nbsp;
demonstrates this on the next slide.&nbsp;

00:03:18.540 --> 00:03:22.770
The first on the list is unfair&nbsp;
comparison and demand effects.&nbsp;

00:03:22.770 --> 00:03:28.560
That's basically two different problems.
The unfair comparison can be understood&nbsp;&nbsp;

00:03:28.560 --> 00:03:36.420
or with the poison and medication example.
If you have your treatment group resistant&nbsp;&nbsp;

00:03:36.420 --> 00:03:41.040
the medication and your control group resists the&nbsp;
poison the fact that the outcomes are different&nbsp;&nbsp;

00:03:41.040 --> 00:03:48.360
does not mean that the medication worked.
It means that, it could also mean that the&nbsp;&nbsp;

00:03:48.360 --> 00:03:51.810
medication didn't work but poison&nbsp;
just made people feel a lot worse.&nbsp;

00:03:51.810 --> 00:03:59.700
Or in an extreme scenario it might mean&nbsp;
that the medication actually is harmful&nbsp;&nbsp;

00:03:59.700 --> 00:04:05.730
but poison is more harmful for people.
The important thing here is that your&nbsp;&nbsp;

00:04:05.730 --> 00:04:11.580
control group should be really neutral&nbsp;
and not like this good and bad comparison.&nbsp;

00:04:11.580 --> 00:04:18.780
Another is a treatment&nbsp;
effect and so demand effect.&nbsp;

00:04:18.780 --> 00:04:27.960
And demand effect relates to all the subjects&nbsp;
in the experiment trying to infer what the&nbsp;&nbsp;

00:04:27.960 --> 00:04:32.400
experimenter is trying to study.
And how the experimenter&nbsp;&nbsp;

00:04:32.400 --> 00:04:39.180
would like them to respond.
This is something that has be studied and there's&nbsp;&nbsp;

00:04:39.180 --> 00:04:45.600
evidence that this phenomenon actually exists.
Even if people are not consciously trying to&nbsp;&nbsp;

00:04:45.600 --> 00:04:50.670
satisfy the demands of the experiment.
That's the first group of issues.&nbsp;

00:04:50.670 --> 00:04:54.870
The second group is on&nbsp;
non-consequential decision environments.&nbsp;

00:04:54.870 --> 00:05:01.260
This is particularly relevant for experimental&nbsp;
vignette studies where we send surveys,&nbsp;&nbsp;

00:05:01.260 --> 00:05:07.320
two versions of surveys, that describe&nbsp;
the same scenario with small variations.&nbsp;

00:05:07.320 --> 00:05:14.070
And then ask people questions about that scenario.
If you just fill in a survey, where there are no&nbsp;&nbsp;

00:05:14.070 --> 00:05:21.870
consequences for you from your actions&nbsp;
and it's not clear if you would respond&nbsp;&nbsp;

00:05:21.870 --> 00:05:26.670
the same way if there acts were consequences.
I'll show you an example on the next slide.&nbsp;

00:05:26.670 --> 00:05:33.030
Then there is deception.
Deception does not&nbsp;&nbsp;

00:05:33.030 --> 00:05:40.290
necessarily invalidate the study.
But there are two arguments against deception.&nbsp;

00:05:40.290 --> 00:05:45.480
One is the ethical argument that researchers&nbsp;
should not lie to their subjects.&nbsp;

00:05:45.480 --> 00:05:50.340
So if you deceive intentionally mislead&nbsp;
your subjects then you are being unethical.&nbsp;

00:05:50.340 --> 00:05:57.900
There is some debate on whether being unethical&nbsp;
this way is acceptable in some scenarios where&nbsp;&nbsp;

00:05:57.900 --> 00:06:04.050
the results would be very important to get.
So there are some important studies in the history&nbsp;&nbsp;

00:06:04.050 --> 00:06:10.110
that have been done using deception and some&nbsp;
of those studies like the Milgram's experiment&nbsp;&nbsp;

00:06:10.110 --> 00:06:17.220
would be considered really unethical now.
Then there's another issue about deceptions.&nbsp;

00:06:17.220 --> 00:06:23.280
So if you have a lab where we invite people&nbsp;
particularly, if you invite students there&nbsp;&nbsp;

00:06:23.280 --> 00:06:29.160
and you know that the students will be on subjects&nbsp;
in a couple of experiments during their studies.&nbsp;

00:06:29.160 --> 00:06:35.160
If you deceive them and they find out that&nbsp;
they were lied to in the first experiment,&nbsp;&nbsp;

00:06:35.160 --> 00:06:38.700
how are they gonna take you&nbsp;
seriously in your second experiment.&nbsp;

00:06:38.700 --> 00:06:43.680
So the arguments are against deception is the&nbsp;
ethical argument and it's also the argument&nbsp;&nbsp;

00:06:43.680 --> 00:06:49.740
that we are kind of like spoiling&nbsp;
our subject pool by lying to them.&nbsp;

00:06:49.740 --> 00:06:55.890
Then our the fourth on the list is manipulation&nbsp;
checks before the dependent variable.&nbsp;

00:06:55.890 --> 00:07:02.250
The idea here is the manipulation check.
What it means is:&nbsp;

00:07:02.250 --> 00:07:07.980
That if we for example give people&nbsp;
medication and that is the kind of&nbsp;&nbsp;

00:07:07.980 --> 00:07:13.410
medication that people take at home.
And then they come back for measurement&nbsp;&nbsp;

00:07:13.410 --> 00:07:19.710
a week later we ask them did you actually take&nbsp;
the medication because some of our subjects might&nbsp;&nbsp;

00:07:19.710 --> 00:07:23.820
have forgotten to take the medication.
And that needs to be taken into account&nbsp;&nbsp;

00:07:23.820 --> 00:07:27.450
in the statistical analysis.
In practice that will be a case&nbsp;&nbsp;

00:07:27.450 --> 00:07:33.510
for using instrumental variables.
Problems arise however if we do&nbsp;&nbsp;

00:07:33.510 --> 00:07:40.830
a manipulation check before there are&nbsp;
measurement of the dependent variable.&nbsp;

00:07:40.830 --> 00:07:48.510
It is then possible that the respondents,&nbsp;
particularly if we a measure or do survey&nbsp;&nbsp;

00:07:48.510 --> 00:07:52.260
based measurement or some other kind of&nbsp;
measurement, where we measure people's attitude.&nbsp;

00:07:52.260 --> 00:07:58.020
Then the subjects may infer, based&nbsp;
on our manipulation check what we&nbsp;&nbsp;

00:07:58.020 --> 00:08:03.060
are actually studying and then trying&nbsp;
to adjust their response accordingly.&nbsp;

00:08:03.960 --> 00:08:11.550
Let's take a look at an example and how&nbsp;
these effects might manifest in a study.&nbsp;

00:08:13.710 --> 00:08:20.730
This is a completely made up study.
This is an expert in a vignette study,&nbsp;&nbsp;

00:08:20.730 --> 00:08:26.820
the idea is that we present two scenarios.
One individual receives one of these scenarios&nbsp;&nbsp;

00:08:26.820 --> 00:08:30.750
in a survey but not the other.
And this is randomized&nbsp;

00:08:30.750 --> 00:08:37.440
So have four informants receive scenario one&nbsp;
half of our informants receive scenario 2 here.&nbsp;

00:08:37.440 --> 00:08:42.900
Then we ask, based on these&nbsp;
two scenarios two things.&nbsp;

00:08:42.900 --> 00:08:46.260
Is the company performing ethically?
That is our manipulation check.&nbsp;

00:08:46.260 --> 00:08:53.070
Would you buy the shoes?
So we have shoes that are&nbsp;&nbsp;

00:08:53.070 --> 00:08:58.070
less expensive than major brand shoes.
You really want to have the shoes.&nbsp;

00:08:58.070 --> 00:09:02.810
You hear that this company uses&nbsp;
child labor and you hear that&nbsp;&nbsp;

00:09:02.810 --> 00:09:09.140
this company is behaving very ethically.
They have a corporate social responsibility&nbsp;&nbsp;

00:09:09.140 --> 00:09:14.090
program that they just announced.
How are these issues listed in the&nbsp;&nbsp;

00:09:14.090 --> 00:09:19.550
Lonad article, manifested in this example.
First of all we have an unfair comparison.&nbsp;

00:09:20.360 --> 00:09:25.310
We are not comparing a bad company&nbsp;
against the neutral company.&nbsp;

00:09:25.310 --> 00:09:31.730
But instead we are comparing very unethical&nbsp;
company against a very ethical company.&nbsp;

00:09:32.330 --> 00:09:36.680
We cannot say that doing&nbsp;
unethical things would be bad.&nbsp;

00:09:36.680 --> 00:09:45.140
Because the baseline is not doing unethical things&nbsp;
but if the baseline is doing good for the society.&nbsp;

00:09:45.140 --> 00:09:54.020
Also we cannot say that CSR programs will be&nbsp;
helpful because the baseline is not no CSR but&nbsp;&nbsp;

00:09:54.020 --> 00:09:57.620
it's very unethical behavior.
That's an unfair comparison.&nbsp;

00:09:57.620 --> 00:10:02.450
It's a poison and medication comparison.
If there's a difference we don't&nbsp;&nbsp;

00:10:02.450 --> 00:10:05.900
know which one causes it.
Then there's a demand effect.&nbsp;

00:10:05.900 --> 00:10:12.170
So if you read this short vignette you see that&nbsp;
this is just basically facts and then there is&nbsp;&nbsp;

00:10:12.170 --> 00:10:17.180
this statement that stands out even if it wasn't&nbsp;
bolded that this company's using child labor.&nbsp;

00:10:17.750 --> 00:10:23.510
That is not something that you would perhaps&nbsp;
know if you were to buy athletic shoes.&nbsp;

00:10:23.510 --> 00:10:27.500
And then there's the other thing&nbsp;
here that these companies use in CSR,&nbsp;&nbsp;

00:10:27.500 --> 00:10:33.110
is implementing a CSR program there is also&nbsp;
information that you probably wouldn't know.&nbsp;

00:10:33.110 --> 00:10:37.820
Or wouldn't notice even if it was&nbsp;
given to you in a broader context.&nbsp;

00:10:37.820 --> 00:10:44.090
But in isolation this stands out and&nbsp;
it is clear that the experiment here&nbsp;&nbsp;

00:10:44.090 --> 00:10:48.710
is about ethics or corporate social&nbsp;
responsibility or something like that.&nbsp;

00:10:48.710 --> 00:10:54.350
And that guides our responses.
If we are say this kind of vignette&nbsp;&nbsp;

00:10:54.350 --> 00:11:01.040
here then we pretty much know that the&nbsp;
researcher wants us to answer no here.&nbsp;

00:11:01.040 --> 00:11:09.920
We would not buy these shoes.
And same here the CSR would imply to us, that&nbsp;&nbsp;

00:11:09.920 --> 00:11:15.890
the researcher is studying social responsibility.
Ae are supposed to say that we buy the issues even&nbsp;&nbsp;

00:11:15.890 --> 00:11:21.050
if they are less expensive for some reason.
That's the demand effect.&nbsp;

00:11:21.050 --> 00:11:27.170
This is also non consequential decision.
Why it's non-consequential is that&nbsp;&nbsp;

00:11:27.170 --> 00:11:32.420
this is just the imaginary money.
Let's say that the brand name shoes cost a&nbsp;&nbsp;

00:11:32.420 --> 00:11:40.160
100 euros and these cheaper shoes cost 70 euros.
If you really are short on cash&nbsp;&nbsp;

00:11:40.160 --> 00:11:44.390
and you need new shoes.
You might think that well&nbsp;&nbsp;

00:11:44.390 --> 00:11:51.530
this time maybe, the company will be better in&nbsp;
the future, it's just this time that I buy these&nbsp;&nbsp;

00:11:51.530 --> 00:11:57.260
shoes from this slightly unethical company.
If there's real money on the line people may&nbsp;&nbsp;

00:11:57.260 --> 00:12:03.380
behave differently than when it's just a question&nbsp;
of what would you do in this imaginary scenario.&nbsp;

00:12:03.380 --> 00:12:09.290
Then the final thing in this&nbsp;
example is the manipulation check.&nbsp;

00:12:09.290 --> 00:12:13.190
And this clearly demonstrated the&nbsp;
manipulation check question is here.&nbsp;

00:12:13.190 --> 00:12:17.840
Is the company, either in scenario&nbsp;
1 or scenario 2, behaving ethically.&nbsp;

00:12:18.470 --> 00:12:22.910
That really gives out the&nbsp;
purpose of the experiment.&nbsp;

00:12:22.910 --> 00:12:32.930
If we read this manipulation check and which&nbsp;
purpose of this check is to basically ensure&nbsp;&nbsp;

00:12:32.930 --> 00:12:37.070
that we have received the manipulation.
We have noticed that one of these&nbsp;&nbsp;

00:12:37.070 --> 00:12:42.500
is more ethical than the other one.
This underlines that this is a study about ethics.&nbsp;

00:12:42.500 --> 00:12:50.840
Then people will respond accordingly saying yes&nbsp;
to the ethical case, no to the unethical case.&nbsp;

00:12:50.840 --> 00:12:54.710
Because that is what they think&nbsp;
that the experimenter wants to see.