Model Comparison

Clustering

Autoregressive Models

Planning as Inference

State Space Models

Artificial Neural Networks

Independent Component Analysis

*
The ideas behind Bayesian inference can be illustrated using the game of tennis.
In this example the best estimate of where a tennis ball will land is made by combining
information about where the opponent served before (the prior) with information from the visual system about
the current trajectory of the ball (the likelihood).
The prior is shown in blue, the likelihood distribution in red, and
the posterior distribution with the white ellipse. The maximum
posterior estimate is shown by the magenta ball. This estimate can
be updated in light of new information from the balls trajectory.
*

* Exact Bayesian inference is not possible for nonlinear models. Instead, one must use approximate inference frameworks, and Variational Inference is one such approach. It factorises the posterior density and optimises the parameters of the factors so as to minimise the KL divergence between the true and approximate posterior. It also provides a lower bound on the the model evidence. This bound is the "negative free energy".
*

*
Model comparison is crucial to the scientific endeavour. How is it we decide that one scientific model is better than another ? Which model has the strongest empirical support ? Bayesian inference in this context allows us to optimally update our beliefs about which are the best models in light of new experimental evidence. This is a continually evolving process. A key quantity here is the evidence for each model, but this can be difficult to compute especially for eg. nonlinear models. One of our recent approaches is to approximate the evidence using a Savage-Dickey method - we decide if model parameters are necessary by looking at how probable they are to be zero a-posteriori versus a-priori.
*

*
We can estimate the number of clusters in a data set using Bayesian inference applied to mixture models eg. the figure below shows two neural systems (two clusters of responsive voxels, one shown in red, the other in green) underlying the pattern of brain activations measured in a multiple subject fMRI study (6 subjects in the red cluster, 11 in the green). These are subjects performing exactly the same task, but using different parts of the brain.
*

*
Linear Autoregressive models predict eg. the future value of a time series using a weighted combination of previous values of the time series. Their multivariate equivalents also use previous values of other time series as predictors. Usefully, one can transform the parameters to make a parametric estimate of eg. the power spectral density matrix (power, coherence etc).
*

*
The Multilayer Perceptron is a classic Artificial Neural Network that was popular in the 1990s. It uses a layer of hidden units each of which bipartitions 'input space' into distinct regions (see straight lines in Figure below). An output layer, comprising a sigmoidal function, can then combine these partitions into an arbitrarily complex response function (see dark versus light regions below). The number of hidden units can be selected using Bayesian model comparison.
*