WEBVTT WEBVTT Kind: captions Language: en 00:00:00.060 --> 00:00:04.860 Reliability validity and reliability  statistics are related in important ways   00:00:04.860 --> 00:00:08.130 that you need to understand when  you evaluate measurement scales. 00:00:08.130 --> 00:00:15.630 Let's take a look at this target practice example  again. And this is the example of valid but   00:00:15.630 --> 00:00:24.000 unreliable measurement. And reliability quantifies  how much the hits in the target are spread around.   00:00:24.000 --> 00:00:32.850 So this is quite unreliable because there is  a lot of spread in these hits in the target. 00:00:32.850 --> 00:00:38.460 So in your statistical software when  you calculate coefficient alpha - the   00:00:38.460 --> 00:00:45.960 statistical software has an option that it can  show you how the Alpha would be different if   00:00:45.960 --> 00:00:51.780 you omit some indicators from your scale.  So for example in this case we have five   00:00:51.780 --> 00:00:58.830 indicators - five shots on the target. Then  what will happen if we omit two indicators. 00:00:58.830 --> 00:01:03.870 Let's say that our statistical software  indicates that the Alpha statistic will   00:01:03.870 --> 00:01:10.230 go up if we take these two away. So we  take away these two shots and what will   00:01:10.230 --> 00:01:15.690 happen? Well alpha will go up because  alpha quantifies how much the shots   00:01:15.690 --> 00:01:20.550 that are on the target are dispersed.  So that's now our new estimate of alpha. 00:01:20.550 --> 00:01:28.800 Dd reliability actually increase? The answer  to this question is no. Not necessarily. It is   00:01:28.800 --> 00:01:35.010 possible that you actually had indicators  that differ on the reliability levels and   00:01:35.010 --> 00:01:40.770 if that is the case then dropping a  indicator with very bad reliability   00:01:40.770 --> 00:01:44.700 could actually improve the reliability of  the scale. But it doesn't necessarily do so. 00:01:44.700 --> 00:01:53.460 Here if we assume that these indicators were all  equally reliable then omitting data from the model   00:01:53.460 --> 00:01:59.700 actually reduces reliability because reliability  is a function of the individual reliability of   00:01:59.700 --> 00:02:04.320 the indicators and how many indicators you  have. So reliability actually went down.   00:02:04.320 --> 00:02:11.580 But the reliability index went up because we are  basically - if we omit indicators here - we are   00:02:11.580 --> 00:02:17.850 basically ignoring evidence against - evidence  that indicates that this scale is unreliable. 00:02:17.850 --> 00:02:24.330 So when you take an indicator away  from the scale - it is possible that   00:02:24.330 --> 00:02:29.430 the scale reliability increases. It is  also possible that you're just throwing   00:02:29.430 --> 00:02:36.390 away information against - that indicates  as valid evidence against reliability. 00:02:36.390 --> 00:02:43.050 So is ignoring evidence against the  result that you want to have a good   00:02:43.050 --> 00:02:49.170 thing to do? Obviously not. So this is the  other thing that Cho is discussing in his   00:02:49.170 --> 00:02:54.840 paper when he says that there's a common  misconception that reliability is increased   00:02:54.840 --> 00:03:00.900 when you follow this alpha if item deleted  advice given by your statistical software. 00:03:00.900 --> 00:03:05.910 So it's possible that reliability goes up  if you eliminate indicators if those are   00:03:05.910 --> 00:03:11.580 really highly unreliable but it is also  possible that you're just are causing   00:03:11.580 --> 00:03:17.340 a positive bias to your reliability  statistic by eliminating information   00:03:17.340 --> 00:03:23.700 that indicates lower reliability. So that  that's one thing if you omit indicators. 00:03:23.700 --> 00:03:31.380 T. D. Little has written nice paper about  whether you should or should not include   00:03:31.380 --> 00:03:38.850 indicators and if someone is saying that you  should drop indicators and you doubt whether   00:03:38.850 --> 00:03:46.920 that is a good thing to do - then this article  by Little provides nice counter-argument for   00:03:46.920 --> 00:03:51.240 the practice of dropping indicator from a  scale to improve the reliability statistics. 00:03:51.240 --> 00:03:54.180 Another interesting thing or important thing that   00:03:54.180 --> 00:03:57.000 you need to understand is  how the items are varied. 00:03:57.000 --> 00:04:02.280 So let's say that we have a scale that is  supposed to measure innovativeness and we   00:04:02.280 --> 00:04:07.560 have three items in the scale. The first  item is that we have released a lot of new   00:04:07.560 --> 00:04:12.720 products in the last three months then we  have the second item is we have released a   00:04:12.720 --> 00:04:18.180 lot of new products in the first quarter of the  year and then we ask the same question again. 00:04:18.180 --> 00:04:25.110 So these three questions are highly correlated. Is  that evidence of reliability of our innovativeness   00:04:25.110 --> 00:04:34.260 scale? The answer is no. It's not valid evidence  of reliability because these items just ask the   00:04:34.260 --> 00:04:41.130 exact same thing in slightly different words  and they violate the parallel tests assumption   00:04:41.130 --> 00:04:49.020 that a classical test theory makes and on which  our reliability indices like alpha are based on. 00:04:49.020 --> 00:04:54.960 So the high correlation between these  three items is not an indication of   00:04:54.960 --> 00:04:59.340 reliability. Instead it is an indication  that these are not parallel tests. You're   00:04:59.340 --> 00:05:04.710 just asking the same question again  and again and that doesn't qualify. 00:05:04.710 --> 00:05:10.080 So if these three indicators are highly  correlated it doesn't really tell us   00:05:10.080 --> 00:05:17.010 anything and then we can also ask whether  this actually measures innovation level of   00:05:17.010 --> 00:05:21.810 the company or does it just measure how many  new products the company has introduced. 00:05:21.810 --> 00:05:29.730 So the problem with developing scale items  is that if the items are too similar then   00:05:29.730 --> 00:05:35.460 they are not parallel. If the items are  too similar they can also just measure   00:05:35.460 --> 00:05:40.530 one specific fact instead of measuring  the constructs. So we need to develop   00:05:40.530 --> 00:05:46.050 our items in a way that they are - if we  measure innovativeness -they are different   00:05:46.050 --> 00:05:50.730 observed consequences of innovation. So  they are distinct different tests from one   00:05:50.730 --> 00:05:54.870 another instead of just repeating the same  question with slightly different wording. 00:05:54.870 --> 00:05:57.990 So these are the two most common problems   00:05:57.990 --> 00:06:01.740 that I see in indicators. How you  develop then. How you choose them. 00:06:01.740 --> 00:06:08.400 One is that people drop indicators based on  their reliability statistics. Sometimes it   00:06:08.400 --> 00:06:15.450 makes sense oftentimes it doesn't. Also  writing items - people ignore . many   00:06:15.450 --> 00:06:19.920 researchers ignore the parallel tests  assumption and what it actually means.   00:06:19.920 --> 00:06:25.410 Your items really need to be distinct instead  of being the same item repeated three times.