Learn about prior, likelihood, posterior; Bayes' theorem; simple inference with PyMC3; and Bayesian model fitting (regression) with PyMC3.
- [Instructor] The last topic in this course is Bayesian inference, a type of statistical inference that has been gaining more and more interest in adoption over the last few decades. Of course I won't be able to do it justice in a few minutes, but I wanted to at least introduce it because it's the kind of statistics that I do every day in my job. I hope I can at least make you curious to learn about it elsewhere. Bayesian inference takes a very different viewpoint from anything else we've seen so far. Instead of estimating from the data a single value for the population parameters, we characterize them with entire probability distributions which represent our knowledge and our uncertainty about them.
So we start with the prior probability which represents what we already know about the parameters, if anything. We make observations, and we use the observations to update the prior into posterior probability. Here's a silly example. Suppose my cat hid behind one of two doors, I don't know which, so my priors were probability of 50% for each door. If however, I observe the tail, I'm able to update these probabilities to 100% for door one, and 0% for door two.
For more general distributions, the update rule is called Bayes' Theorem, and it involves multiplying the prior by the likelihood. The likelihood describes the observations, how likely they are, given a certain value of the parameters. This simple rule is sufficient to generate 90% of Bayesian inference, but the computation may be hard. And in fact, before computers, it was too hard for many statisticians to take Bayesian inference seriously. So what can we do with Python? I'm going to introduce a very powerful package for Bayesian inference, pymc3.
Let me load a few packages. I'm going to do a classic Bayesian problem. I have a single coin and I'm suspicious that it's not fair, that is, throwing it repeatedly may return heads between 40% and 80% of the time instead of 50%. So I have thrown the coins 100 times, and I found 61 heads. Now I create my pymc3 model. This is done by setting up a Python context.
And then running pymc3 instructions inside a context. p will be the actual fraction that heads comes up for this coin. My prior for it is that it's uniformly distributed between 40% and 80%. Next, my actual observations, the binomial distribution describes the probability of observing k events over n trials if each event has a probability, p, n trials, probability, p, and our observation is k.
And finally we sample the posterior, that is, we ask pymc3 to generate a large set of population parameters, which will be approximately distributed according to the posterior. This set of parameters is called a trace. Pymc3 may take a little while to do the sampling, especially the first time. Now I can ask it to summarize the posterior for me.
The mean is 60%, that's the most probable value for the bias-ness of the coin. And the standard deviation is about 5%. The hpd, delimited credible interval, analog to a confidence interval, containing 95% of the probability. Pymc3 can also do a nice traceplot. There are two curves here, because pymc3 ran two separate sound banks.
The plot on the right shows all the values taken on by the parameter, p. Now we move on to Bayesian statistical modeling. And once again, we consider the gapminder data for child survival and babies per woman. So I read the dependents and I down-select to year 1985. To plot it better, I will sort the data by age5_surviving value.
And let's do a very quick, very simple-minded plot. We've done much better. This downward trend is what we'll need to reproduce. So I will put together a very simple model, just a intercept, and a main term, a slope, for age5_surviving. So I open a context and write the priors.
For the intercept, I can figure out from the plot, that there should be about 10 where survival is 65%, so I will have a uniform prior between five and 15. For the slope, I can again figure out from the plot, that it should be somewhere around a few tenths. So let's do minus one to one. Next, I define the observations. This will be normally distributed around the model, that is, I allow each data value to have a random measurement error with normal distribution.
This is the simplest, most likely hypothesis. Errors are very important in Bayesian inference. The center here, called nu, of the normal distribution, would then be the intercept, plus the slope, times my explanatory variable, which I am referring to 65%.
I'll take the standard deviation of the measurement error to be one. And finally, my observation. All done, and I can sample. Don't worry about these messages. Or go look them up in the pymc3 documentation. So here's the summary and the traceplot.
The posteriors are centered a little above 10 for the intercept, and close to minus .22 for the slope. We have a very good control over the uncertainty of the parameters. Evaluated at the mean of the posterior, the model would look like this. I'm taking mean values from the posterior. And now I'll multiply by the explanatory variable.
I will plot over the data. And that's our Bayesian fit. In fact, every point in the posterior corresponds to a slightly different model. We can plot them all together to visualize uncertainty. Let me copy some code, the data again, and then I will loop over points in the posterior.
Let me do one more thing. Take every 50th point and also stop the iteration after 100 points. I can do that by adding another iterator here, range to 100, and also i to catch it. So then I repeat the calculation of the linear model and plot.
Let's give it some transparency. Here is my bundle of models within the posterior. And that's all folks.
- Installing and setting up Python
- Importing and cleaning data
- Visualizing data
- Describing distributions and categorical variables
- Using basic statistical inference and modeling techniques
- Bayesian inference