WEBVTT Kind: captions Language: en 00:00:00.381 --> 00:00:02.209 In this video, I will explain 00:00:02.209 --> 00:00:05.197 how the maximum likelihood  estimation principle can be applied 00:00:05.197 --> 00:00:07.200 to estimate a logistic regression model. 00:00:07.590 --> 00:00:09.690 Our data are girls and 00:00:09.690 --> 00:00:11.100 the dependent variable is, 00:00:11.100 --> 00:00:13.380 whether a girl has had a Menarche or not. 00:00:13.380 --> 00:00:16.170 And the independent variable is Age. 00:00:16.546 --> 00:00:18.826 And we're fitting the logistic curve. 00:00:18.826 --> 00:00:22.380 So we can see here that when  the girl's age is close to 10, 00:00:22.380 --> 00:00:24.030 which is the minimum of the sample, 00:00:24.030 --> 00:00:28.259 the predictive probability  of menarche is about zero, 00:00:28.538 --> 00:00:30.248 and when the girl is 18, 00:00:30.248 --> 00:00:32.100 which is about the maximum of the sample, 00:00:32.100 --> 00:00:35.580 the predictive probability of menarche is about 1. 00:00:36.138 --> 00:00:39.600 And we want to estimate this logistic curve, 00:00:39.600 --> 00:00:41.100 how it goes, 00:00:41.100 --> 00:00:45.120 and it tells us the relationship  between age and menarche. 00:00:45.817 --> 00:00:51.060 We apply the probability calculations to values 00:00:51.060 --> 00:00:52.380 that are 1's and 0's, 00:00:52.380 --> 00:00:53.790 that's the dependent variable. 00:00:53.790 --> 00:00:54.990 And to do that, 00:00:54.990 --> 00:00:56.760 we use the Bernoulli distribution. 00:00:57.527 --> 00:01:02.256 The idea of a Bernoulli distribution  is that we only have 1's and 0's. 00:01:02.256 --> 00:01:08.032 And in this example, the 0's  are twice as prevalent as 1's, 00:01:08.032 --> 00:01:12.900 and the population is always very  large in maximum likelihood estimation, 00:01:12.900 --> 00:01:17.250 because when we take a sample of 1, 0 or 1, 00:01:17.250 --> 00:01:18.600 1 away from the population, 00:01:18.600 --> 00:01:22.530 the ratio of 1's or 0's should stay the same, 00:01:22.530 --> 00:01:24.900 even if we take a sample away from the population. 00:01:25.918 --> 00:01:30.960 The probability of getting  0 is 67% from this sample, 00:01:30.960 --> 00:01:33.453 and the probability of getting 1 is 33%. 00:01:33.815 --> 00:01:37.200 So when we have this set of  observed values that are sample, 00:01:37.200 --> 00:01:39.263 we have seven 0's and two 1's. 00:01:39.570 --> 00:01:41.460 They happen to be in this order by random, 00:01:41.460 --> 00:01:43.440 it doesn't have any significance, 00:01:43.579 --> 00:01:45.109 or any meaning. 00:01:45.109 --> 00:01:47.323 And we calculate the probabilities, 00:01:47.546 --> 00:01:50.070 then we calculate the total probability by 00:01:50.070 --> 00:01:52.830 multiplying all these individual  probabilities together. 00:01:53.290 --> 00:01:55.000 So when we know, 00:01:55.000 --> 00:01:58.986 what the population is, 00:01:58.986 --> 00:02:04.140 then we know the probabilities of getting  particular values from that population. 00:02:04.642 --> 00:02:07.282 In maximum likelihood estimation, 00:02:07.290 --> 00:02:10.140 the population is not known, 00:02:10.140 --> 00:02:11.757 but we have to estimate, 00:02:11.757 --> 00:02:14.940 what is the effect of age on  menarche in the population, 00:02:14.940 --> 00:02:16.223 and what's the base level. 00:02:16.502 --> 00:02:20.310 And, so we don't talk about probabilities, 00:02:20.310 --> 00:02:21.810 we talk about likelihoods. 00:02:22.340 --> 00:02:25.110 So the idea of maximum  likelihood estimation is that 00:02:25.110 --> 00:02:32.670 we try to find a population that  has the maximum likelihood of having 00:02:32.670 --> 00:02:34.020 produced these values here. 00:02:34.299 --> 00:02:35.409 So we don't know, 00:02:35.409 --> 00:02:38.219 what the mean is or what's  the ratio of 1's and 0's, 00:02:38.303 --> 00:02:39.623 we only know the data. 00:02:39.874 --> 00:02:44.801 And we assume that the model  exists for the population. 00:02:45.000 --> 00:02:47.156 Then we calculate, 00:02:47.880 --> 00:02:50.640 we have some guesses for this ratio, 00:02:50.640 --> 00:02:53.280 and then we calculate likelihoods, 00:02:53.280 --> 00:02:55.050 we calculate the cumulative likelihood, 00:02:55.050 --> 00:02:59.490 and we maximize the cumulative likelihood  to find the maximum likelihood estimation 00:02:59.490 --> 00:03:02.910 by changing our model parameters. 00:03:03.440 --> 00:03:06.230 So for example, we could guess that, 00:03:06.540 --> 00:03:10.032 the ratio is 2 to 7, 00:03:10.032 --> 00:03:15.900 that gives us probabilities  of 78% and 22% for 0' and 1's. 00:03:15.900 --> 00:03:18.810 We calculate the cumulative probabilities, 00:03:18.810 --> 00:03:20.850 or we multiply everything together. 00:03:20.850 --> 00:03:25.140 And this is the likelihood of the  sample given our estimated population. 00:03:26.437 --> 00:03:29.476 The maximum likelihood estimate is simply 00:03:29.476 --> 00:03:34.590 found by changing our guess  of the ratio of 1's and 0's, 00:03:34.590 --> 00:03:38.253 so that this value here  becomes as large as possible. 00:03:39.131 --> 00:03:43.150 This principle is applied to the  logistic regression analysis. 00:03:43.582 --> 00:03:47.315 The idea is that we calculate  using this logistic curve 00:03:47.315 --> 00:03:48.817 and this age here, 00:03:48.817 --> 00:03:53.160 and the known ages and the known  menarche status of these girls. 00:03:53.160 --> 00:03:57.090 We calculate the individual  likelihood for the observations, 00:03:57.090 --> 00:03:59.550 and then we use those individual likelihoods 00:03:59.550 --> 00:04:03.480 to find the best possible  logistic curve for the data. 00:04:03.926 --> 00:04:05.936 How it works in practice is that 00:04:05.936 --> 00:04:07.410 we have some kind of guess. 00:04:07.410 --> 00:04:13.710 So we guess that menarche  is a linear function of age, 00:04:13.710 --> 00:04:17.760 and an intercept transforms  using the logistic function. 00:04:17.760 --> 00:04:21.000 So let's say that the intercept is -20, 00:04:21.000 --> 00:04:23.940 and the effect of age is 1.54, 00:04:23.940 --> 00:04:27.540 we apply logit function to the linear prediction, 00:04:27.540 --> 00:04:32.100 then we calculate, that gives  us the expected probabilities. 00:04:32.421 --> 00:04:33.501 Then we check, 00:04:33.501 --> 00:04:36.990 how likely that particular observation is, 00:04:36.990 --> 00:04:38.640 given the fitted probability. 00:04:38.640 --> 00:04:43.129 So for example, the first girl here is 13.6 years 00:04:43.297 --> 00:04:45.007 and she has had menarche. 00:04:45.383 --> 00:04:50.089 The linear prediction for that girl  using this equation here is 0.94. 00:04:50.716 --> 00:04:55.410 Then the fitter probability  using the logistic function, 00:04:55.410 --> 00:04:58.493 to this linear prediction, is 73.6%. 00:04:59.035 --> 00:05:03.600 So if the probability is 73.6%, 00:05:03.600 --> 00:05:05.515 and the girl has had menarche, 00:05:05.515 --> 00:05:09.859 then the likelihood for that observation is 73.6. 00:05:10.659 --> 00:05:12.807 Then we move on to the next girl. 00:05:12.807 --> 00:05:16.579 So that's 11.4 years and she has not had menarche. 00:05:16.579 --> 00:05:19.110 The linear prediction is -2 .44, 00:05:19.110 --> 00:05:21.630 so it's calculated using this equation here, 00:05:22.104 --> 00:05:27.244 and we apply logic function, gives  us 8% predictive probability. 00:05:27.662 --> 00:05:30.810 Because it is, it's only 8% probability 00:05:30.810 --> 00:05:34.165 that this girl would have  had menarche given her age 00:05:34.290 --> 00:05:35.550 and she didn't. 00:05:35.550 --> 00:05:39.870 Then the likelihood for  this observation is 1 - 8%, 00:05:39.870 --> 00:05:42.210 which is 92% here. 00:05:42.517 --> 00:05:44.377 We do that calculation, 00:05:44.377 --> 00:05:47.850 we calculate the likelihood for all the girls 00:05:47.850 --> 00:05:51.572 and that gives us the product 6.4%. 00:05:51.724 --> 00:05:53.746 For computational reasons, 00:05:53.746 --> 00:05:57.960 we don't typically work  with these raw likelihoods, 00:05:57.960 --> 00:05:59.716 and multiply them together. 00:05:59.716 --> 00:06:02.404 Instead, we work with logarithms. 00:06:02.864 --> 00:06:05.509 So we calculate the logarithm of the likelihood, 00:06:05.509 --> 00:06:09.000 called the log-likelihood for  each individual observation, 00:06:09.000 --> 00:06:11.940 and we take a sum of these log-likelihoods 00:06:11.940 --> 00:06:15.750 and that gives us the full  log-likelihood of the sample. 00:06:16.071 --> 00:06:20.176 And we adjust the values of intercept, 00:06:20.176 --> 00:06:24.516 and values of age or the coefficient for age 00:06:24.516 --> 00:06:30.660 to make this full sample  log-likelihood as large as possible. 00:06:30.660 --> 00:06:34.200 In practice, this is almost  always a negative number. 00:06:34.200 --> 00:06:39.090 So we try to make it closer to  zero or smaller negative number.