Text Box: CodeText Box: Selected
bibliography 
Text Box: LinksText Box: CVText Box: PublicationsText Box: Welcome

 

 

This page describes a variational Bayesian toolbox for the identification of stochastic nonlinear dynamic causal models (sDCM).

 

 

 

 

 

 

Lorenz stochastic system (left: sample-path in state space, middle: predictive density over time, right: equivalent DCM)

 

 

 

HOW TO USE IT

 

Download and install

 

The toolbox can be downloaded here.

It requires at least MATLAB 7.1 to run.

Add the folders ‘/DAVB’ and ‘/DAVB/subfunctions’ to your MATLAB path.

 

NB: A function-by-function description of the toolbox (along with dependency graphs) can be found here.

 

 

 

Getting started

 

The root folder ‘/DAVB’ contains the functions that implement the variational Bayesian inversion scheme. The ‘/DAVB/subfunctions’ subfolder contains routines for simulating (and identifying) a number of particular stochastic dynamical systems (double well system, Lorenz attractor, Van Der Pol and Rossler oscillators, Henon map, hierarchical stable heteroclinic channels, fMRI HRF and DCM models, etc…). A number of demonstration scripts have also been written. These are all .m files contained in the subfolder /DAVB/subfunctions’, whose name, somewhat surprisingly, begin with ‘demo_’ (e.g., ‘demo_Lorenz.m’). When called, these scripts simulate stochastic system time series, invert them and then evaluate the inversion in terms of estimation efficiency. Editing these scripts might then prove useful to understand the I/O of the code.

 

Note that the main call function is ‘VBA_NLStateSpaceModel.m’.

Its input arguments are (for a more complete description, see the section stochastic dynamic causal models below):

·         the data y

·         the input u (can be left empty)

·         the name/handle of the observation function g

·         the name/handle of the evolution function f (can be left empty)

·         the dimensions of the model variables

·         optional arguments for specifying, e.g., priors (can be left empty)

Its output arguments are:

·         the first- and second-order moments of the posterior densities of all unknown variables in the model

·         some diagnostics of the variational Bayesian approximate inference algorithm

 

You can construct your own generative model by simply creating evolution and observation functions. These functions must comply to a standard I/O, which is described in the header of the main call function (see also examples in the subfolder /DAVB/subfunctions’). Note that the variational Bayesian inversion is quicker if these functions output gradients with respect to states and/or parameters, but an automatic numerical derivation scheme is used if not.

 

Typically, priors, delays, microtime resolution, etc… (see below) are passed through the optional arguments of the main call function. These are described in the header of the file (see also the code description).

 

 

 

Simulated Lorenz system under the sDCM generative assumptions

(saturated sigmoidal observation function -- demo_Lorenz.m)

 

 

 

Graphical output

 

By default, the model inversion terminates by displaying the results in a MATLAB window, using tabs, which are described below:

·         summary: general information about the model inversion. These include the log-model evidence and its decomposition into marginal entropies and prior-posterior Kullback-Leibler divergences.

·         VB inversion: sufficient statistics of the posterior densities are displayed here. These include the posterior predictive densities, which are plotted against data. This can be useful to eyeball the quality of the model fit. For example, any structure in the residuals is indicative of poor model fit. Note that errorbars depict one posterior standard deviation. Also, evolution/observation parameters and initial conditions are plotted relative to their prior expectation (which is not the case for observed data and hidden states).

·         diagnostics: these include microtime resolution hidden-states, estimated state- and measurement noise time-series and histograms, absolute posterior correlation matrix of the evolution/observation parameters and initial conditions… Note that when using delay embedding (see above), the extra (embedding) dimensions are not displayed in the ‘VB inversion’ tab. However, they can be eyeballed in this tab (at micro-time resolution).

·         kernels: impulse response functions to each specified input (in observation and in system’s state spaces). These are evaluated using the posterior expectation of the evolution/observation parameters. Note that the initial conditions of the system are used as initial conditions when the inputs are applied.

·         conv: history of free energy optimization over the course of the variational Bayesian iterative algorithm. This tab also recalls some of the options that have been used during the inversion, and might have an influence onto the optimization.

·         priors: this tab creates a similar display than ‘VB inversion’, but with the sufficient statistics of the prior density (including the prior predictive density, which is also plotted against the data).

·         deterministic: this tab creates a similar display than ‘VB inversion’, but with the sufficient statistics of the initialized posterior density (which relied upon the deterministic variant of the dynamical model). Note that the ‘diagnostics’ tab also displays the absolute posterior correlation matrix of the evolution/observation parameters under the deterministic model.

Right clicking on the axes allows you to send the graph to a new window for printing purposes. Closing this window replaces the axes where they were. Note that this graphical output can be recreated by calling the function ‘VBA_ReDisplay.m’, whose inputs are the output of the main call function.

 

 

 

VB inversion of the above simulated Lorenz system (left: tab ‘VB inversion’, right: tab ‘diagnostics’)

 

 

 

 

WHAT CLASS OF GENERATIVE MODELS DOES IT DEAL WITH?

 

Any Bayesian data analysis relies upon a generative model, i.e. a probabilistic description of the mechanisms by which observed data are generated. This toolbox does not invert any generative model: it has been developed to deal with a certain class of generative models, which is described below.

 

 

 

Stochastic nonlinear dynamic causal models

 

The most general class of generative models this toolbox can invert is called stochastic nonlinear dynamic causal models (sDCM). It is defined by a joint probability distribution over the following set of variables:

·         y: the pXnt data time-series

·         x: the nXnt hidden states time-series

·         x0: the nX1 initial conditions of the system

·         u: the nuXnt inputs time-series

·         θ: the nθX1 evolution parameters

·         φ: the nφX1 observation parameters

·         α: the state noise precision

·         σ: the measurement noise precision

These variables are assumed to obey the above nonlinear state-space model:

     xt+1 = f(xt,θ,ut) + ηt

     yt = g(xt) + εt

where g and f are the so-called observation and evolution functions, respectively.

In this formulation, ε and η are random variables, namely measurement and state noise, respectively. They are assumed to follow normal distributions of the form: ε~N(0,σ-1Qy) and η~N(0,α-1Qx), where Qx and Qy are known (time dependent) covariance matrices. Note that toolbox actually uses inverse covariance  (i.e. precision) matrices.

Additional prior assumptions (maybe noninformative) about the statistical behaviour of x0, θ, φ, σ and α are necessary to then finalize the specification of the generative model. Without loss of generality, we assume:

-          Gaussian priors for x0, θ and φ.

-          Gamma priors for σ and α.

Fixing any of these parameters simply means using infinitely precise priors. This is done by zeroing the adequate prior variances.

 

 

 

 

Graphical representation of the sDCM class of generative models

 

A variational Bayesian approach is then used to invert this generative model, under two approximations:

·         a mean-field separability assumption between the hidden states, the initial conditions, the evolution/observation parameters and the precision hyperparameters.

·         A Laplace approximation on all variables except precision hyperparameters (which have Gamma posterior densities).

Thus, the variational Bayesian updates boil down to an iterated regularized Gauss-Newton optimization scheme. The second-order moments of the approximate posterior densities are then simply related to the curvature of the local cost functions. Note that the variational Bayesian update of the hidden states is very similar in form to a Kalman filter/smoother. This critically reduces the computational complexity of the scheme. The so-called free energy is then used for approximate Bayesian model comparison.

 

 

 

What about deterministic systems?

 

Deterministic DCM is simply a particular case of the sDCM class of generative models above. Deterministic systems can be modelled by a priori assuming an infinite state noise precision (α→∞). The states time-series xt then become an implicit function of the evolution parameters θ. The data is thus determined solely by the initial conditions x0 and the evolution/observation parameters (θ and φ). This means that the identification of the such deterministic systems is treated like a static model (see appropriate section above) of the form:

     y = h(θ,φ,x0) + ε

where the effective observation function h is such that:

     dht/dφ  = dgt/dφ

     dht/dθ  = (dgt/dxt) (dxt/dθ) = (dgt/dxt) ( dft/dθ + (dft/dxt-1) (dxt-1/dθ) )

     dht/dx0 = (dgt/dxt) (dft/dxt-1) (dft-1/dxt-2) … (df1/dx0)

where the two last lines express the recursive dependency of the effective observation function h (at time t), with respect to the evolution parameters θ and initial conditions x0.

Given the evolution parameters posterior density p(θ|y), an estimate of the states time series can also be obtained:

     E[xt|y] = xt(E[θ|y])

     V[xt|y] = (dxt/dθ)T V[θ|y] (dxt/dθ)

where the Laplace approximation has been used.

Importantly, the VB treatment of deterministic DCMs does not rely upon a mean-field factorization of the joint posterior density p(θ,φ|y) of evolution and observation parameters. This can be useful to detect non-identifiability issues between these two sets of parameters.

 

NB: by default, the toolbox’s inversion of the above sDCM generative model is initialized by inverting the deterministic variant of the dynamical model.

 

 

 

Dealing with delays

 

Delayed dynamical systems are generally non-Markovian, in their native form. The toolbox can deal with certain forms of delays by embedding the state space x into an augmented state-space X, where Xt = (xt,xt-1,xt-2,…,xt-D)T. Here, D is the maximum delay considered. This means that the dimension of the embedded state space is now Dn, which can be considerably high.

 

Alternatively, one may construct a ‘corrected’ evolution function, using the following first-order Taylor expansion around zero-delays. Let us assume that the system obeys the following ODE (in continuous time):

     dx/dt = f(x(t-D))

Taylor-expand the evolution function around D=0:

     f(x(t-D)) = f(x(t)) – D (df/dx) (dx/dt) + O(D2)

Thus:

     dx/dt = ( I+D (df/dx) )-1 f(x(t)) + O(D2)

which can then be used to derive a ‘delay-corrected’ evolution function in discrete time. For example, an Euler discretization scheme yields:

     xt+1 = xt + Δt ( I+D (df/dx) )-1 f(xt,θ,ut) + ηt

where Δt is the time discretization step and ηt lumps Δt O(D2) error terms and any other perturbations.

Note that in this formulation, delays can now be estimated as part of the evolution parameters (D є θ).

 

 

Extending the sDCM generative model to serially correlated state noise

 

The toolbox can deal with autoregressive state noise, using another form of embedding, whereby the AR(1) state noise now acts as hidden states, passed through an effective observation function that accounts for the original evolution function. In essence, this scheme directly estimates the (stochastic) input to the system, treating the hidden states as dummy variables, whose response to noise is deterministic but hysteretic.

 

Alternatively, one can construct an augmented state space X, where Xt = (xt,zt)T. Here, zt is the autoregressive component of the state noise:

     Xt+1 = [ f(xt,θ,ut) + zt  ,  A zt ]T + ηt

     yt    = g(xt) + εt

where the A matrix is now part of an augmented evolution parameter vector (A є θ). The toolbox’s default solution of the AR(1) state noise is basically identical to the simple case where A = I, without additional (non autoregressive) state noise ηt on the original system’s states dynamics xt (the corresponding block of the augmented state noise covariance matrix Qx is effectively zero).

Note that an autoregressive process of any order can be obtained by augmenting the state space with retarded state noise (zt,zt-1,…,zt-D)T, where D is the order of the autoregressive process. This is basically using the above delay embedding trick.

 

 

 

Hierarchical sDCM generative models

 

One may be willing to model a hierarchical sDCM process, for which the generative model could be:

     xt+1(k)   = f(k) (xt(k),θ,ut) + ηt(k)

     ut(k)      = g(k)(xt(k),φ) + εt(k)

     xt+1(k-1) = f(k-1) (xt(k-1),θ,ut(k)) + ηt(k-1)

     ut(k-1)    = g(k-1)(xt(k-1),φ) + εt(k-1)

    

     xt+1(1)   = f(1)(xt(1),θ,ut(2)) + ηt(1)

     ut(1)      = g(1) (xt(1),φ) + εt(1)

where there are k levels in the hierarchy and the original (and only) observations are yt = ut(1).

The toolbox does not deal with this sort of hierarchical sDCM generative model. However, one can invert a limiting case of this hierarchical sDCM generative model, for which the variance of the noise processes εt(k), εt(k-1), …, εt(2)  tends to zero (Qy(k), Qy(k-1), …, Qy(2)→0). In this case, the above hierarchical model collapses down to an augmented non-hierarchical sDCM model of the form:

     xt+1(k)   = f(k) (xt(k),θ,ut) + ηt(k)

     xt+1(k-1) = f(k-1) (xt(k-1),θ, g(k)(xt(k),φ)) + ηt(k-1)

    

     xt+1(1)   = f(1)(xt(1),θ, g(2)(xt(2),φ)) + ηt(1)

     yt         = g(1) (xt(1),φ) + εt(1)

Again, this is now formally equivalent to the dynamical embedding procedure that was referred above for dealing with delays and serially correlated state noise, where the augmented state space is now Xt = (xt(k),xt(k-1), …,xt(1))T .

 

 

 

Getting closer to continuous dynamical systems

 

The sDCM evolution equation is inherently discrete in time. If one uses this as an approximation to a continuous dynamical system, then the goodness of the approximation strongly depends on the sampling frequency. The toolbox allows one to specify a ‘microtime resolution’, which is used to recursively apply the evolution function between two time samples. This can be useful to minimize the time discretization errors introduced when approximating the original continuous dynamical system. For example, let us assume that the system obeys the following ODE:

     dx/dt = f(x)

We wish to derive a prediction from x(t) to the next time sample, x(t+Δt).

One simple solution to this is to use any time discretization and apply recursively it on a smaller time grid. For example, if one uses an Euler scheme:

     x(t+δ) = F(x(t))

where F(x(t)) = x(t) + δ f(x(t)) and δ is a small integration step (δ<<Δt). Then the recursive application of the Euler evolution function F yields:

     x(t+Δt)           = F • F • ... • F(x(t))

     dx(t+Δt)/dx(t) = (dF/dx) (dF/dx) ... (dF/dx)

where the function F and its gradient is evaluated recursively (Δt/δ times) on the ‘microtime’ resolution grid defined by δ.

In the limit (Δt/δ>>1), this recursive Euler scheme tends to a matrix exponential flow, i.e.: F(x(t)) = x(t) + (dF/dx)-1 ( exp[ Δt (dF/dx) ] – I ) f(x(t)). Note that this procedure does not allow state noise to enter anywhere else than at the times where data are sampled. This means that the impact of state noise will be somehow underestimated.

 

 

 

Dealing with binary observations

 

The toolbox can deal with binary data, by treating them as binomial (Bernouilli) samples, whose sufficient statistics (expectation) is given by the observation function. This means that the observation equation is basically replaced by the following binomial log-likelihood function:

     log P(yt|xt) = yt log(g(xt)) + (1-yt) log(1-g(xt))

where g(xt) = P(yt=1|xt,φ) = E[yt|xt], by definition.

Note that here, the measurement noise precision σ and covariance components Qy are irrelevant.

It turns out that this does not affect much the variational Bayesian inversion scheme under the Laplace approximation. This is because the posterior density on evolution/observation parameters, hidden states and initial conditions is still asymptotically Gaussian.

 

Note also that when the dimension of the data is high enough (pnt>>1), the empirical distribution of the ‘residuals’ εt = yt - g(xt) will tend to a Normal density. This means that one can then approximate the above likelihood with a Gaussian density with mean g(xt) and (unknown) variance σ-1, which falls back into the sDCM class of generative models (this result is actually used during the initialization of the algorithm).

 

 

 

Inverting static (hierarchical) models

 

Static models are simplifications of the above sDCM class of generative models, where the dimension of the hidden states and the evolution parameters tend to zero (nt=1, n→0 and nθ→0). This reduces the model to a nonlinear observation equation:

     y = g(φ) + ε

with fixed priors on the observation parameters.

 

A simple (2 levels) hierarchical extension of this static model (whereby the priors are also learned) can easily be derived by actually removing the evolution/observation parameters, but retaining the initial conditions and the first hidden states, with, e.g., identity evolution function:

     x = x0 + η

     y = g(x) + ε

where x, x0, σ and α are estimated.

 

As many hierarchical levels as necessary (e.g., t>1 levels) can be obtained by collapsing the higher levels onto the next to last level. For example, the following hierarchical (linear) generative model

     x1  = A1 x0 + η1

    

     xt-1 = At-1 xt-2 + ηt-1

     xt   = At xt-1 + ηt

     yt   = g(xt) + εt

where At are known matrices, can be rewritten as:

     xt = At At-1 … A1 x0 + ηt + At ηt-1 + At At-1 ηt-1 + At At-1 … A1 η1

     yt = g(xt) + εt

which is just another particular case of the above generative model, with linear evolution function f(x0) = At At-1 … A1 x0 and appropriate state covariance matrix, e.g., Qx = (I + At + At At-1 + At At-1 … A1)2 if one originally assumed i.i.d. state noise.

 

Alternatively, one can also retain the ‘time-series’ structure of the sDCM generative model, and use infinite measurement noise covariance components (Qy→∞) up to the next to last ‘time’, to arrive at the same effective generative model (whereby only data at the last level will be considered for inference).

 

Note that probabilistic PCA, ICA and factor analyses obtain from retaining both the states and the observation parameters, which then encode the unknown so-called ‘mixing matrix’ Φ:

     x = f(x0) + η

     y = Φx + ε

where the observation function has been restricted to the bilinear mapping g(x,φ)= Φx, and Φ is simply the matrix whose elements are composed of the vector of (observation) parameters φ. These data decomposition methods then effectively correspond to different priors and evolution functions (e.g. diagonal prior covariance component for factor analysis, sigmoidal evolution function for non-Gaussian sources in ICA, etc…).

 

 

 

ACKNOWLEDGEMENTS

 

This code is provided ‘as is’. It is permanently maintained, but its scientific description can be considered unchanged and is exposed in length in:

 

Variational Bayesian identification and prediction of stochastic nonlinear dynamic causal models

J. Daunizeau, K.J. Friston, S.J. Kiebel

Physica D: nonlinear phenomena (2009), 238: 2089-2118. [PDF] [technical note] [erratum]

 

A recent review on Dynamic Causal Modelling (DCM) can also be found in:

 

Dynamic Causal Modelling: a critical review of the biophysical and statistical foundations

J. Daunizeau, O. David, K. E. Stephan

Neuroimage (2010), in press. [PDF] [erratum]

 

Questions? j.daunizeau@fil.ion.ucl.ac.uk