WEBVTT
Kind: captions
Language: en

00:00:00.381 --> 00:00:02.209
In this video, I will explain

00:00:02.209 --> 00:00:05.197
how the maximum likelihood&nbsp;
estimation principle can be applied

00:00:05.197 --> 00:00:07.200
to estimate a logistic regression model.

00:00:07.590 --> 00:00:09.690
Our data are girls and

00:00:09.690 --> 00:00:11.100
the dependent variable is,

00:00:11.100 --> 00:00:13.380
whether a girl has had a Menarche or not.

00:00:13.380 --> 00:00:16.170
And the independent variable is Age.

00:00:16.546 --> 00:00:18.826
And we're fitting the logistic curve.

00:00:18.826 --> 00:00:22.380
So we can see here that when&nbsp;
the girl's age is close to 10,

00:00:22.380 --> 00:00:24.030
which is the minimum of the sample,

00:00:24.030 --> 00:00:28.259
the predictive probability&nbsp;
of menarche is about zero,

00:00:28.538 --> 00:00:30.248
and when the girl is 18,

00:00:30.248 --> 00:00:32.100
which is about the maximum of the sample,

00:00:32.100 --> 00:00:35.580
the predictive probability of menarche is about 1.

00:00:36.138 --> 00:00:39.600
And we want to estimate this logistic curve,

00:00:39.600 --> 00:00:41.100
how it goes,

00:00:41.100 --> 00:00:45.120
and it tells us the relationship&nbsp;
between age and menarche.

00:00:45.817 --> 00:00:51.060
We apply the probability calculations to values

00:00:51.060 --> 00:00:52.380
that are 1's and 0's,

00:00:52.380 --> 00:00:53.790
that's the dependent variable.

00:00:53.790 --> 00:00:54.990
And to do that,

00:00:54.990 --> 00:00:56.760
we use the Bernoulli distribution.

00:00:57.527 --> 00:01:02.256
The idea of a Bernoulli distribution&nbsp;
is that we only have 1's and 0's.

00:01:02.256 --> 00:01:08.032
And in this example, the 0's&nbsp;
are twice as prevalent as 1's,

00:01:08.032 --> 00:01:12.900
and the population is always very&nbsp;
large in maximum likelihood estimation,

00:01:12.900 --> 00:01:17.250
because when we take a sample of 1, 0 or 1,

00:01:17.250 --> 00:01:18.600
1 away from the population,

00:01:18.600 --> 00:01:22.530
the ratio of 1's or 0's should stay the same,

00:01:22.530 --> 00:01:24.900
even if we take a sample away from the population.

00:01:25.918 --> 00:01:30.960
The probability of getting&nbsp;
 0 is 67% from this sample,

00:01:30.960 --> 00:01:33.453
and the probability of getting 1 is 33%.

00:01:33.815 --> 00:01:37.200
So when we have this set of&nbsp;
observed values that are sample,

00:01:37.200 --> 00:01:39.263
we have seven 0's and two 1's.

00:01:39.570 --> 00:01:41.460
They happen to be in this order by random,

00:01:41.460 --> 00:01:43.440
it doesn't have any significance,

00:01:43.579 --> 00:01:45.109
or any meaning.

00:01:45.109 --> 00:01:47.323
And we calculate the probabilities,

00:01:47.546 --> 00:01:50.070
then we calculate the total probability by

00:01:50.070 --> 00:01:52.830
multiplying all these individual&nbsp;
probabilities together.

00:01:53.290 --> 00:01:55.000
So when we know,

00:01:55.000 --> 00:01:58.986
what the population is,

00:01:58.986 --> 00:02:04.140
then we know the probabilities of getting&nbsp;
particular values from that population.

00:02:04.642 --> 00:02:07.282
In maximum likelihood estimation,

00:02:07.290 --> 00:02:10.140
the population is not known,

00:02:10.140 --> 00:02:11.757
but we have to estimate,

00:02:11.757 --> 00:02:14.940
what is the effect of age on&nbsp;
menarche in the population,

00:02:14.940 --> 00:02:16.223
and what's the base level.

00:02:16.502 --> 00:02:20.310
And, so we don't talk about probabilities,

00:02:20.310 --> 00:02:21.810
we talk about likelihoods.

00:02:22.340 --> 00:02:25.110
So the idea of maximum&nbsp;
likelihood estimation is that

00:02:25.110 --> 00:02:32.670
we try to find a population that&nbsp;
has the maximum likelihood of having

00:02:32.670 --> 00:02:34.020
produced these values here.

00:02:34.299 --> 00:02:35.409
So we don't know,

00:02:35.409 --> 00:02:38.219
what the mean is or what's&nbsp;
the ratio of 1's and 0's,

00:02:38.303 --> 00:02:39.623
we only know the data.

00:02:39.874 --> 00:02:44.801
And we assume that the model&nbsp;
exists for the population.

00:02:45.000 --> 00:02:47.156
Then we calculate,

00:02:47.880 --> 00:02:50.640
we have some guesses for this ratio,

00:02:50.640 --> 00:02:53.280
and then we calculate likelihoods,

00:02:53.280 --> 00:02:55.050
we calculate the cumulative likelihood,

00:02:55.050 --> 00:02:59.490
and we maximize the cumulative likelihood&nbsp;
to find the maximum likelihood estimation

00:02:59.490 --> 00:03:02.910
by changing our model parameters.

00:03:03.440 --> 00:03:06.230
So for example, we could guess that,

00:03:06.540 --> 00:03:10.032
the ratio is 2 to 7,

00:03:10.032 --> 00:03:15.900
that gives us probabilities&nbsp;
of 78% and 22% for 0' and 1's.

00:03:15.900 --> 00:03:18.810
We calculate the cumulative probabilities,

00:03:18.810 --> 00:03:20.850
or we multiply everything together.

00:03:20.850 --> 00:03:25.140
And this is the likelihood of the&nbsp;
sample given our estimated population.

00:03:26.437 --> 00:03:29.476
The maximum likelihood estimate is simply

00:03:29.476 --> 00:03:34.590
found by changing our guess&nbsp;
of the ratio of 1's and  0's,

00:03:34.590 --> 00:03:38.253
so that this value here&nbsp;
becomes as large as possible.

00:03:39.131 --> 00:03:43.150
This principle is applied to the&nbsp;
logistic regression analysis.

00:03:43.582 --> 00:03:47.315
The idea is that we calculate&nbsp;
using this logistic curve

00:03:47.315 --> 00:03:48.817
and this age here,

00:03:48.817 --> 00:03:53.160
and the known ages and the known&nbsp;
menarche status of these girls.

00:03:53.160 --> 00:03:57.090
We calculate the individual&nbsp;
likelihood for the observations,

00:03:57.090 --> 00:03:59.550
and then we use those individual likelihoods

00:03:59.550 --> 00:04:03.480
to find the best possible&nbsp;
logistic curve for the data.

00:04:03.926 --> 00:04:05.936
How it works in practice is that

00:04:05.936 --> 00:04:07.410
we have some kind of guess.

00:04:07.410 --> 00:04:13.710
So we guess that menarche&nbsp;
is a linear function of age,

00:04:13.710 --> 00:04:17.760
and an intercept transforms&nbsp;
using the logistic function.

00:04:17.760 --> 00:04:21.000
So let's say that the intercept is -20,

00:04:21.000 --> 00:04:23.940
and the effect of age is 1.54,

00:04:23.940 --> 00:04:27.540
we apply logit function to the linear prediction,

00:04:27.540 --> 00:04:32.100
then we calculate, that gives&nbsp;
us the expected probabilities.

00:04:32.421 --> 00:04:33.501
Then we check,

00:04:33.501 --> 00:04:36.990
how likely that particular observation is,

00:04:36.990 --> 00:04:38.640
given the fitted probability.

00:04:38.640 --> 00:04:43.129
So for example, the first girl here is 13.6 years

00:04:43.297 --> 00:04:45.007
and she has had menarche.

00:04:45.383 --> 00:04:50.089
The linear prediction for that girl&nbsp;
using this equation here is 0.94.

00:04:50.716 --> 00:04:55.410
Then the fitter probability&nbsp;
using the logistic function,

00:04:55.410 --> 00:04:58.493
to this linear prediction, is 73.6%.

00:04:59.035 --> 00:05:03.600
So if the probability is 73.6%,

00:05:03.600 --> 00:05:05.515
and the girl has had menarche,

00:05:05.515 --> 00:05:09.859
then the likelihood for that observation is 73.6.

00:05:10.659 --> 00:05:12.807
Then we move on to the next girl.

00:05:12.807 --> 00:05:16.579
So that's 11.4 years and she has not had menarche.

00:05:16.579 --> 00:05:19.110
The linear prediction is -2 .44,

00:05:19.110 --> 00:05:21.630
so it's calculated using this equation here,

00:05:22.104 --> 00:05:27.244
and we apply logic function, gives&nbsp;
us 8% predictive probability.

00:05:27.662 --> 00:05:30.810
Because it is, it's only 8% probability

00:05:30.810 --> 00:05:34.165
that this girl would have&nbsp;
had menarche given her age

00:05:34.290 --> 00:05:35.550
and she didn't.

00:05:35.550 --> 00:05:39.870
Then the likelihood for&nbsp;
this observation is 1 - 8%,

00:05:39.870 --> 00:05:42.210
which is 92% here.

00:05:42.517 --> 00:05:44.377
We do that calculation,

00:05:44.377 --> 00:05:47.850
we calculate the likelihood for all the girls

00:05:47.850 --> 00:05:51.572
and that gives us the product 6.4%.

00:05:51.724 --> 00:05:53.746
For computational reasons,

00:05:53.746 --> 00:05:57.960
we don't typically work&nbsp;
with these raw likelihoods,

00:05:57.960 --> 00:05:59.716
and multiply them together.

00:05:59.716 --> 00:06:02.404
Instead, we work with logarithms.

00:06:02.864 --> 00:06:05.509
So we calculate the logarithm of the likelihood,

00:06:05.509 --> 00:06:09.000
called the log-likelihood for&nbsp;
each individual observation,

00:06:09.000 --> 00:06:11.940
and we take a sum of these log-likelihoods

00:06:11.940 --> 00:06:15.750
and that gives us the full&nbsp;
log-likelihood of the sample.

00:06:16.071 --> 00:06:20.176
And we adjust the values of intercept,

00:06:20.176 --> 00:06:24.516
and values of age or the coefficient for age

00:06:24.516 --> 00:06:30.660
to make this full sample&nbsp;
log-likelihood as large as possible.

00:06:30.660 --> 00:06:34.200
In practice, this is almost&nbsp;
always a negative number.

00:06:34.200 --> 00:06:39.090
So we try to make it closer to&nbsp;
zero or smaller negative number.