WEBVTT
Kind: captions
Language: en
00:00:00.381 --> 00:00:02.209
In this video, I will explain
00:00:02.209 --> 00:00:05.197
how the maximum likelihood
estimation principle can be applied
00:00:05.197 --> 00:00:07.200
to estimate a logistic regression model.
00:00:07.590 --> 00:00:09.690
Our data are girls and
00:00:09.690 --> 00:00:11.100
the dependent variable is,
00:00:11.100 --> 00:00:13.380
whether a girl has had a Menarche or not.
00:00:13.380 --> 00:00:16.170
And the independent variable is Age.
00:00:16.546 --> 00:00:18.826
And we're fitting the logistic curve.
00:00:18.826 --> 00:00:22.380
So we can see here that when
the girl's age is close to 10,
00:00:22.380 --> 00:00:24.030
which is the minimum of the sample,
00:00:24.030 --> 00:00:28.259
the predictive probability
of menarche is about zero,
00:00:28.538 --> 00:00:30.248
and when the girl is 18,
00:00:30.248 --> 00:00:32.100
which is about the maximum of the sample,
00:00:32.100 --> 00:00:35.580
the predictive probability of menarche is about 1.
00:00:36.138 --> 00:00:39.600
And we want to estimate this logistic curve,
00:00:39.600 --> 00:00:41.100
how it goes,
00:00:41.100 --> 00:00:45.120
and it tells us the relationship
between age and menarche.
00:00:45.817 --> 00:00:51.060
We apply the probability calculations to values
00:00:51.060 --> 00:00:52.380
that are 1's and 0's,
00:00:52.380 --> 00:00:53.790
that's the dependent variable.
00:00:53.790 --> 00:00:54.990
And to do that,
00:00:54.990 --> 00:00:56.760
we use the Bernoulli distribution.
00:00:57.527 --> 00:01:02.256
The idea of a Bernoulli distribution
is that we only have 1's and 0's.
00:01:02.256 --> 00:01:08.032
And in this example, the 0's
are twice as prevalent as 1's,
00:01:08.032 --> 00:01:12.900
and the population is always very
large in maximum likelihood estimation,
00:01:12.900 --> 00:01:17.250
because when we take a sample of 1, 0 or 1,
00:01:17.250 --> 00:01:18.600
1 away from the population,
00:01:18.600 --> 00:01:22.530
the ratio of 1's or 0's should stay the same,
00:01:22.530 --> 00:01:24.900
even if we take a sample away from the population.
00:01:25.918 --> 00:01:30.960
The probability of getting
0 is 67% from this sample,
00:01:30.960 --> 00:01:33.453
and the probability of getting 1 is 33%.
00:01:33.815 --> 00:01:37.200
So when we have this set of
observed values that are sample,
00:01:37.200 --> 00:01:39.263
we have seven 0's and two 1's.
00:01:39.570 --> 00:01:41.460
They happen to be in this order by random,
00:01:41.460 --> 00:01:43.440
it doesn't have any significance,
00:01:43.579 --> 00:01:45.109
or any meaning.
00:01:45.109 --> 00:01:47.323
And we calculate the probabilities,
00:01:47.546 --> 00:01:50.070
then we calculate the total probability by
00:01:50.070 --> 00:01:52.830
multiplying all these individual
probabilities together.
00:01:53.290 --> 00:01:55.000
So when we know,
00:01:55.000 --> 00:01:58.986
what the population is,
00:01:58.986 --> 00:02:04.140
then we know the probabilities of getting
particular values from that population.
00:02:04.642 --> 00:02:07.282
In maximum likelihood estimation,
00:02:07.290 --> 00:02:10.140
the population is not known,
00:02:10.140 --> 00:02:11.757
but we have to estimate,
00:02:11.757 --> 00:02:14.940
what is the effect of age on
menarche in the population,
00:02:14.940 --> 00:02:16.223
and what's the base level.
00:02:16.502 --> 00:02:20.310
And, so we don't talk about probabilities,
00:02:20.310 --> 00:02:21.810
we talk about likelihoods.
00:02:22.340 --> 00:02:25.110
So the idea of maximum
likelihood estimation is that
00:02:25.110 --> 00:02:32.670
we try to find a population that
has the maximum likelihood of having
00:02:32.670 --> 00:02:34.020
produced these values here.
00:02:34.299 --> 00:02:35.409
So we don't know,
00:02:35.409 --> 00:02:38.219
what the mean is or what's
the ratio of 1's and 0's,
00:02:38.303 --> 00:02:39.623
we only know the data.
00:02:39.874 --> 00:02:44.801
And we assume that the model
exists for the population.
00:02:45.000 --> 00:02:47.156
Then we calculate,
00:02:47.880 --> 00:02:50.640
we have some guesses for this ratio,
00:02:50.640 --> 00:02:53.280
and then we calculate likelihoods,
00:02:53.280 --> 00:02:55.050
we calculate the cumulative likelihood,
00:02:55.050 --> 00:02:59.490
and we maximize the cumulative likelihood
to find the maximum likelihood estimation
00:02:59.490 --> 00:03:02.910
by changing our model parameters.
00:03:03.440 --> 00:03:06.230
So for example, we could guess that,
00:03:06.540 --> 00:03:10.032
the ratio is 2 to 7,
00:03:10.032 --> 00:03:15.900
that gives us probabilities
of 78% and 22% for 0' and 1's.
00:03:15.900 --> 00:03:18.810
We calculate the cumulative probabilities,
00:03:18.810 --> 00:03:20.850
or we multiply everything together.
00:03:20.850 --> 00:03:25.140
And this is the likelihood of the
sample given our estimated population.
00:03:26.437 --> 00:03:29.476
The maximum likelihood estimate is simply
00:03:29.476 --> 00:03:34.590
found by changing our guess
of the ratio of 1's and 0's,
00:03:34.590 --> 00:03:38.253
so that this value here
becomes as large as possible.
00:03:39.131 --> 00:03:43.150
This principle is applied to the
logistic regression analysis.
00:03:43.582 --> 00:03:47.315
The idea is that we calculate
using this logistic curve
00:03:47.315 --> 00:03:48.817
and this age here,
00:03:48.817 --> 00:03:53.160
and the known ages and the known
menarche status of these girls.
00:03:53.160 --> 00:03:57.090
We calculate the individual
likelihood for the observations,
00:03:57.090 --> 00:03:59.550
and then we use those individual likelihoods
00:03:59.550 --> 00:04:03.480
to find the best possible
logistic curve for the data.
00:04:03.926 --> 00:04:05.936
How it works in practice is that
00:04:05.936 --> 00:04:07.410
we have some kind of guess.
00:04:07.410 --> 00:04:13.710
So we guess that menarche
is a linear function of age,
00:04:13.710 --> 00:04:17.760
and an intercept transforms
using the logistic function.
00:04:17.760 --> 00:04:21.000
So let's say that the intercept is -20,
00:04:21.000 --> 00:04:23.940
and the effect of age is 1.54,
00:04:23.940 --> 00:04:27.540
we apply logit function to the linear prediction,
00:04:27.540 --> 00:04:32.100
then we calculate, that gives
us the expected probabilities.
00:04:32.421 --> 00:04:33.501
Then we check,
00:04:33.501 --> 00:04:36.990
how likely that particular observation is,
00:04:36.990 --> 00:04:38.640
given the fitted probability.
00:04:38.640 --> 00:04:43.129
So for example, the first girl here is 13.6 years
00:04:43.297 --> 00:04:45.007
and she has had menarche.
00:04:45.383 --> 00:04:50.089
The linear prediction for that girl
using this equation here is 0.94.
00:04:50.716 --> 00:04:55.410
Then the fitter probability
using the logistic function,
00:04:55.410 --> 00:04:58.493
to this linear prediction, is 73.6%.
00:04:59.035 --> 00:05:03.600
So if the probability is 73.6%,
00:05:03.600 --> 00:05:05.515
and the girl has had menarche,
00:05:05.515 --> 00:05:09.859
then the likelihood for that observation is 73.6.
00:05:10.659 --> 00:05:12.807
Then we move on to the next girl.
00:05:12.807 --> 00:05:16.579
So that's 11.4 years and she has not had menarche.
00:05:16.579 --> 00:05:19.110
The linear prediction is -2 .44,
00:05:19.110 --> 00:05:21.630
so it's calculated using this equation here,
00:05:22.104 --> 00:05:27.244
and we apply logic function, gives
us 8% predictive probability.
00:05:27.662 --> 00:05:30.810
Because it is, it's only 8% probability
00:05:30.810 --> 00:05:34.165
that this girl would have
had menarche given her age
00:05:34.290 --> 00:05:35.550
and she didn't.
00:05:35.550 --> 00:05:39.870
Then the likelihood for
this observation is 1 - 8%,
00:05:39.870 --> 00:05:42.210
which is 92% here.
00:05:42.517 --> 00:05:44.377
We do that calculation,
00:05:44.377 --> 00:05:47.850
we calculate the likelihood for all the girls
00:05:47.850 --> 00:05:51.572
and that gives us the product 6.4%.
00:05:51.724 --> 00:05:53.746
For computational reasons,
00:05:53.746 --> 00:05:57.960
we don't typically work
with these raw likelihoods,
00:05:57.960 --> 00:05:59.716
and multiply them together.
00:05:59.716 --> 00:06:02.404
Instead, we work with logarithms.
00:06:02.864 --> 00:06:05.509
So we calculate the logarithm of the likelihood,
00:06:05.509 --> 00:06:09.000
called the log-likelihood for
each individual observation,
00:06:09.000 --> 00:06:11.940
and we take a sum of these log-likelihoods
00:06:11.940 --> 00:06:15.750
and that gives us the full
log-likelihood of the sample.
00:06:16.071 --> 00:06:20.176
And we adjust the values of intercept,
00:06:20.176 --> 00:06:24.516
and values of age or the coefficient for age
00:06:24.516 --> 00:06:30.660
to make this full sample
log-likelihood as large as possible.
00:06:30.660 --> 00:06:34.200
In practice, this is almost
always a negative number.
00:06:34.200 --> 00:06:39.090
So we try to make it closer to
zero or smaller negative number.