Commit b6359ee2 authored by Michiel Cottaar's avatar Michiel Cottaar
Browse files

add MCMC practical

parent efa90b68
This source diff could not be displayed because it is too large. You can view the blob instead.
## Markov chain Monte Carlo (MCMC) - Practical
# import some useful libaries
import numpy as np # numeric library
import matplotlib.pyplot as plt # main plotting library
import seaborn as sns # additional plotting library
### Bayesian model inversion
In this practical we will do Bayesian model inversion using MCMC for some simple models.
Bayesian model inversion assumes we have some forward model mapping parameters to data.
We then want to invert this to map some observed data to the parameters.
This inversion is done through Bayes equation (derived in the lecture):
$$P(\vec{\theta}|\vec{x}) = P(\vec{x}|\vec{\theta}) P(\vec{\theta}) / P(\vec{x}),$$
- $P(\vec{\theta}|\vec{x})$ is the probability distribution of the parameters $\vec{\theta}$ conditional on the data $\vec{x}$. This is called the posterior and is what we want to derive in Bayesian model inversion.
- $P(\vec{x}|\vec{\theta})$ is the probability distribution of the data $\vec{x}$ conditional on the parameters $\vec{\theta}$. This is called the likelihood. It is defined by our forward model.
- $P(\vec{\theta})$ is the probability distribution of the parameters independent of the data. This is called the prior and should reflect our prior belief of what the parameters were before seeing the data. In order to not bias the analysis, one often uses uninformative (i.e., very broad) prior distribution. A common choice is $P(\vec{\theta}) = 1$ or $P(\vec{\theta}) = 1/\vec{\theta}$.
- $P(\vec{x})$ is the probability of getting the data integrated over all possible parameters. It is called the marginal likelihood or evidence and reflects how likely one is to get the data given the forward model.
As we will see below, the likelihood and prior are usually given by the user based on their forward model for the data.
The marginal likelihood can then be computed as:
$$P(\vec{x}) = \int P(\vec{x}, \vec{\theta}) d\vec{\theta} = \int P(\vec{x}|\vec{\theta}) P(\vec{\theta}) d\vec{\theta}$$
So, we can compute the marginal likelihood as the integral of the likelihood multiplied by the prior over all possible parameters.
Once we have computed this marginal likelihood (which is just a scalar given the observed data) we can divide the likelihood and prior by this number to get the posterior distribution.
Unfortunately, this requires integrating over all possible parameters, which will be impractical for all but the most simple cases.
So, we need an algorithm that allows us to estimate properties of the posterior without having to compute the marginal likelihood, namely MCMC
### Markov chain Monte Carlo (MCMC)
#### Metropolis-Hastings MCMC algorithm
In MCMC we generate samples from the posterior distribution without explicitly having to know what the posterior distribution looks like.
We do this by following a random walk through parameter space using the following algorithm:
- given the location of the random walk in parameter space $\vec{\theta}_n$ in step $n$, generate some candidate paraemters from a proposal distribution $\vec{\theta}_c \sim P_{\rm prop}(\vec{\theta}_c|\vec{\theta_n})$. A common choice for proposal distribution is a Gaussian centered on $\vec{\theta_n}$.
- Draw a random number between 0 and 1. If this number is larger than the posterior ratio ($P(\vec{\theta}_c|\vec{x})/P(\vec{\theta}_n|\vec{x})$) accept the new point. This in effect gives an acceptance probability of:
$$P_{\rm accept} = {\rm min}(P(\vec{\theta}_c|\vec{x})/P(\vec{\theta}_n|\vec{x}),1)$$
- If accepted the next point is the candidate point: $\vec{\theta}_{n+1} = \vec{\theta}_{c}$
- If rejected the next point is the original point: $\vec{\theta}_{n+1} = \vec{\theta}_{n}$
Crucially, we can compute the posterior ratio ($P(\vec{\theta}_c|\vec{x})/P(\vec{\theta}_n|\vec{x})$) without computing the marginal likelihood.
This is because when we enter Bayes equation, the marginal likelihood drops out.
Instead the posterior ratio is given by the multiplication of the ratio of the likelihoods and the priors:
$$\frac{P(\vec{\theta}_c|\vec{x})}{P(\vec{\theta}_n|\vec{x})} = \frac{P(\vec{x}|\vec{\theta}_c)}{P(\vec{x}|\vec{\theta}_n)} \frac{P(\vec{\theta}_c)}{P(\vec{\theta}_n)}$$
Let's start with a simple example, where are data is drawn from a Gaussian distribution with unknown mean, but known standard devation:
np.random.seed(1) # set seed to get consistent results
true_mu = 1.3 # unknown
sigma = 1 # known
x = true_mu + np.random.randn(5) * sigma # draw 5 samples
Our likelihood function in this case is:
$$P(\vec{x} | \mu) = N(\mu, \sigma^2=1)$$
We will assume an uninformative prior of $P(\mu) = 1$.
So the posterior ratio is given by:
$$\frac{P(\mu_c|\vec{x})}{P(\mu_n|\vec{x})} = \frac{P(\vec{x}|\mu_c)}{P(\vec{x}|\mu_n)} \frac{P(\mu_c)}{P(\mu_n)} = \frac{N(\mu_c, \sigma^2=1)}{N(\mu_n, \sigma^2=1)}$$
In code this looks like:
np.random.seed(2) # set seed to get consistent results
all_samples = []
current_mu = 0. # we have to start our random walk somewhere...
for _ in range(10000): # let's run the MCMC for 10000 steps
# Draw from Gaussian proposal function
candidate = np.random.randn() + current_mu
# Compute posterior ratio
ratio = - candidate) ** 2 / 2 * sigma ** 2)) / - current_mu) ** 2 / 2 * sigma ** 2))
if np.random.rand() < ratio: # accept if random number < P_accept
current_mu = candidate
else: # otherwise reject
current_mu = current_mu
# always add new point to the chain, even if candidate got rejected!
Let's look at the results
plt.hist(all_samples, bins=51, label='MCMC samples', density=True)
plt.axvline(true_mu, label='true mean', color='black')
sample_mean = np.mean(x)
sample_mean_std = sigma / np.sqrt(len(x))
xval = np.linspace(-4, 4, 101) * sample_mean_std + sample_mean
plt.plot(xval, 1/np.sqrt(2 * np.pi * sample_mean_std ** 2) * np.exp(-(xval - sample_mean) ** 2 / (2 * sample_mean_std ** 2)), label='analytical solution')
Some caveats:
- In practice we will nearly always work with log-probability rather than probality functions.
So rather than computing the posterior ratio as `ratio = - candidate) ** 2 / 2)) / - current_mu) ** 2 / 2))`
we would compute the posterior ratio as `ratio = np.exp(0.5 * (np.sum(-(x - candidate) ** 2)) - np.sum(-(x - current_mu) ** 2)))`.
This has the advantage that the product in the first case can become extremely small, which can lead to numerical instabilities in the calculations. We will see this in action below.
- The acceptance rate is given by the posterior distribution only for a *symmetric* proposal function.
For asymmetric proposal functions (i.e., $P_{\rm prop}(\vec{\theta}_c|\vec{\theta_n}) \ne P_{\rm prop}(\vec{\theta}_n|\vec{\theta_c}))$), the acceptance probability will also depend on the ratio of the acceptance probabilities (see Here we will only investigate symmetric proposal distributions.
#### Multi-parameter model example
First we define a generic MCMC algorithm. We assume the user provides some function $f$ which gives the sum
of the log-likelihood and log-prior given some set of parameters.
def metropolis_hastings(f, x0, nsteps=10000, step_size=1.):
"""MCMC using Metropolis-Hastings algorithm
:param f: function mapping vector of parameters to sum of log-likelihood and log-prior
:param x0: starting set of parameters
:param nsteps: how many steps to run the MCMC for
:param step_size: size of the Gaussian proposal function
params = []
current = x0
current_f = f(x0)
for _ in range(nsteps):
candidate = np.random.randn(len(current)) * step_size + current
candidate_f = f(candidate)
ratio = np.exp(candidate_f - current_f)
if np.random.rand() < ratio:
current = candidate
current_f = candidate_f
return np.array(params)
This is the full MCMC algorithm. Make sure you understand what is going on!
Let's see this in action for another simple Gaussian example.
This time we assume the variance is also unknown:
$$P(\vec{x} | \mu, \sigma^2) = N(\mu, \sigma^2)$$
We will assume uninformative Jeffrey's priors of $P(\mu) = 1$ and $P(\sigma^2) = 1/\sigma^2$.
np.random.seed(1) # set seed to get consistent results
true_mu = 1.3 # unknown
true_std = 1 # unknown
x = true_mu + np.random.randn(5) * true_std # draw 5 samples
def log_likelihood_prior(params):
mu, variance = params
if variance < 0:
return -np.inf # will cause any candidated with negative variance to be rejected. This allows one to set constraints on parameters in MCMC
log_likelihood = 0.5 * np.sum(-(x - mu) ** 2 / variance) - 0.5 * len(x) * np.log(variance) # ignoring constant term
log_prior = -np.log(variance)
return log_likelihood + log_prior
samples = metropolis_hastings(log_likelihood_prior, [0, 1])
Seaborn allows us to plot the samples from the posterior distribution in 2D (compare with lecture)
as_dict = {
'mean': samples[:, 0],
'variance': samples[:, 1],
jp = sns.jointplot(x='mean', y='variance', data=as_dict)
# plot sample mean and variance as black lines
jp.ax_joint.axvline(np.mean(x), color='black')
jp.ax_joint.axhline(np.var(x, ddof=1), color='black')
We can characterise the posterior distribution by looking at summary measures of our samples:
print(f'P(mu > 0) = {np.sum(samples[:, 0] > 0) / samples.shape[0]}')
print(f'Expectation value of mu (E(mu) = {np.mean(samples[:, 0])}) matches the sample mean ({np.mean(x)})')
Here we see that we can estimate the probability of the mean being bigger than zero by the fraction of samples bigger than zero. Similarly, we can estimate the mean of the distribution by computing the mean of the samples.
Note that the precision of these estimates is limited by the number of *independent* samples.
### Automatically adjusting MCMC step size
The efficiency of the MCMC algorithm will depend greatly on how closely the proposal function matches the target posterior function.
The closer they match, the more efficient the algorithm.
This can be best visualised using the trace:
# use the same model/data as previous section
np.random.seed(1) # set seed to get consistent results
fig, axes = plt.subplots(3, 1, figsize=(4, 10), sharex=True)
for ax, step_size, label in zip(
axes, (0.05, 1, 10), ('too small', 'decent', 'too large')
# run MCMC three times with three different step sizes
samples = metropolis_hastings(log_likelihood_prior, [0, 1], step_size=step_size)
ax.plot(samples[:, 0]) # the trace is simply a line plot showing the evolution of some parameters during the MCMC
ax.set_title(f'step size is {label}')
axes[-1].set_xlabel('step time (t)')
We want subsequent samples to look independent from each other (i.e., middle panel) rather than varying very slowly (indicative of too small step size; top panel) or getting stuck (indicative of too large step size; bottom panel). Note that irrespective of step size, the distribution of MCMC will converge to the posterior distribution. With the wrong step size it just might take much, *much* longer...
A nice way to ensure the step size is correct (and deal with correlated parameters) is to adjust the step size automatically.
To get correlated parameters we will fit a straight line to data that has not been demeaned. So the model is:
$$y = a x + b + \epsilon$$
or in terms of probabilities
- Likelihood: $P(y| a, x, b, \sigma^2) = N(a x + b, \sigma^2)$
- Prior: $P(a, b, \sigma^2) = 1/\sigma^2$
x = np.random.randn(10) + 6
true_a = 1
true_b = 0
true_variance = 10
y = true_a * x + true_b + np.random.randn(10) * np.sqrt(true_variance)
def logp(params):
a, b, var = params
if var < 0:
return -np.inf
offset = y - (a * x + b)
log_likelihood = -0.5 * np.sum(offset ** 2 / var) - 0.5 * len(x) * np.log(var)
log_prior = -np.log(var)
return log_likelihood + log_prior
plt.plot(x, y, '.')
# extend x- and y-axes to include origin
plt.xlim(0, plt.xlim()[1])
plt.ylim(0, plt.ylim()[1])
First we update the Metropolis-Hastings function to accept
a variance matrix.
The proposal function will be drawn from a Gaussian with this variance.
def metropolis_hastings_cov(f, x0, variance_matrix, nsteps=10000):
"""MCMC using Metropolis-Hastings algorithm
:param f: function mapping vector of parameters to sum of log-likelihood and log-prior
:param x0: starting set of parameters
:param variance_matrix: variance to use for Gaussian proposal function
:param nsteps: how many steps to run the MCMC for
params = []
# take square root of matrix for sampling
sample_mat = np.linalg.cholesky(variance_matrix)
current = x0
current_f = f(x0)
for _ in range(nsteps):
candidate = sample_mat @ np.random.randn(len(current)) + current
candidate_f = f(candidate)
ratio = np.exp(candidate_f - current_f)
if np.random.rand() < ratio:
current = candidate
current_f = candidate_f
return np.array(params)
Then we define a new function, which estimates the best variance for the Gaussian proposal function in an iterative manner:
def iterative_metropolis_hastings(f, x0, nsteps=10000, niter=3):
"""MCMC using Metropolis-Hastings algorithm
Proposal variance is determined iteratively
:param f: function mapping vector of parameters to sum of log-likelihood and log-prior
:param x0: starting set of parameters
:param nsteps: how many steps to run the MCMC for
# start with small, isotropic step size
current_variance_mat = np.eye(len(x0)) * 1e-2
for _ in range(niter - 1):
# For each iteration we start by running a short (i.e., with reduced number of steps) MCMC
samples = metropolis_hastings_cov(f, x0, current_variance_mat, nsteps=nsteps//10)
# set the new variance matrix to the covariance of the samples of this MCMC (divided by the number of parameters)
current_variance_mat = np.cov(samples.T) / len(x0)
x0 = samples[-1] # also update starting position
# Finally, run the full MCMC with the covariance of the proposal functions set by the last iteration
return metropolis_hastings_cov(f, x0, current_variance_mat, nsteps=nsteps) # run final MCMC with all steps
Without any iterations we can see from the trace that the initial step size is clearly too small:
samples_no_iteration = iterative_metropolis_hastings(logp, [0, 0, 1.], niter=1)
plt.plot(samples_no_iteration[:, 0])
However, after several iterations of updating the proposal function covariance matrix, we get a nice trace:
iter_samples = iterative_metropolis_hastings(logp, [0, 0, 1.], niter=5)
plt.plot(iter_samples[:, 0])
The resulting distribution of a and b looks like:
as_dict = {
'a': iter_samples[:, 0],
'b': iter_samples[:, 1],
'variance': iter_samples[:, 2],
sns.jointplot(x='a', y='b', data=as_dict)
# add true value as red star
jp.ax_joint.scatter(true_a, true_b, color='r', marker='*')
Ideas for further exploration:
- Figure out how many iterations are needed to get an efficient sampling of the posterior
- Define your own forward model (i.e., likelihood & priors). Note that to run MCMC, you would only have to write a function that computes the sum of the log_likelihood and log_prior given some parameters. You can then use the MCMC algorithms above to sample the posterior.
- Test MCMC for a model with (many) more parameters.
\ No newline at end of file
# This is the MCMC (Markov chain Monte Carlo) practical
The practical is in Python. There is a Notebook versio (.ipynb file) and a Markdown version (.md).
To use the Notebook version, you need to start a jupyter notebook and open the file within the notebook.
You may also want to just view the Markdown and copy-paste the code into your favourite Python shell.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment