Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov;4(11):e1000211.
doi: 10.1371/journal.pcbi.1000211. Epub 2008 Nov 7.

Hierarchical Models in the Brain

Free PMC article

Hierarchical Models in the Brain

Karl Friston. PLoS Comput Biol. .
Free PMC article


This paper describes a general model that subsumes many parametric models for continuous data. The model comprises hidden layers of state-space or dynamic causal models, arranged so that the output of one provides input to another. The ensuing hierarchy furnishes a model for many types of data, of arbitrary complexity. Special cases range from the general linear model for static data to generalised convolution models, with system noise, for nonlinear time-series analysis. Crucially, all of these models can be inverted using exactly the same scheme, namely, dynamic expectation maximization. This means that a single model and optimisation scheme can be used to invert a wide range of models. We present the model and a brief review of its inversion to disclose the relationships among, apparently, diverse generative models of empirical data. We then show that this inversion can be formulated as a simple neural network and may provide a useful metaphor for inference and learning in the brain.

Conflict of interest statement

The author has declared that no competing interests exist.


Figure 1
Figure 1. Conditional dependencies of dynamic (right) and hierarchical (left) models, shown as directed Bayesian graphs.
The nodes of these graphs correspond to quantities in the model and the responses they generate. The arrows or edges indicate conditional dependencies between these quantities. The form of the models is provided, both in terms of their state-space equations (above) and in terms of the prior and conditional probabilities (below). The hierarchal structure of these models induces empirical priors; dynamical priors are mediated by the equations of generalised motion and structural priors by the hierarchical form, under which states in higher levels provide constraints on the level below.
Figure 2
Figure 2. Image representations of the precision matrices encoding temporal dependencies among the generalised motion of random fluctuations.
The precision in generalised coordinates (left) and over discrete samples in time (right) are shown for a roughness of γ = 4 and seventeen observations (with an order of n = 16). This corresponds to an autocorrelation function whose width is half a time bin. With this degree of temporal correlation only a few (i.e., five or six) discrete local observations are specified with any precision.
Figure 3
Figure 3. Example of estimation under a mixed-effects or hierarchical linear model.
The inversion was cross-validated with expectation maximization (EM), where the M-step corresponds to restricted maximum likelihood (ReML). This example used a simple two-level model that embodies empirical shrinkage priors on the first-level parameters. These models are also known as parametric empirical Bayes (PEB) models (left). Causes were sampled from the unit normal density to generate a response, which was used to recover the causes, given the parameters. Slight differences in the hyperparameter estimates (upper right), due to a different hyperparameterisation, have little effect on the conditional means of the unknown causes (lower right), which are almost indistinguishable.
Figure 4
Figure 4. Example of Factor Analysis using a hierarchical model, in which the causes have deterministic and stochastic components.
Parameters and causes were sampled from the unit normal density to generate a response, which was then used for their estimation. The aim was to recover the causes without knowing the parameters, which is effected with reasonable accuracy (upper). The conditional estimates of the causes and parameters are shown in lower panels, along with the increase in free-energy or log-evidence, with the number of DEM iterations (lower left). Note that there is an arbitrary affine mapping between the conditional means of the causes and their true values, which we estimated, post hoc to show the correspondence in the upper panel.
Figure 5
Figure 5. This schematic shows the linear convolution model used in the subsequent figure in terms of a directed Bayesian graph.
In this model, a simple Gaussian ‘bump’ function acts as a cause to perturb two coupled hidden states. Their dynamics are then projected to four response variables, whose time-courses are cartooned on the left. This figure also summarises the architecture of the implicit inversion scheme (right), in which precision-weighted prediction errors drive the conditional modes to optimise variational action. Critically, the prediction errors propagate their effects up the hierarchy (c.f., Bayesian belief propagation or message passing), whereas the predictions are passed down the hierarchy. This sort of scheme can be implemented easily in neural networks (see last section and for a neurobiological treatment). This generative model uses a single cause v (1), two dynamic states formula image and four outputs y 1,…,y 4. The lines denote the dependencies of the variables on each other, summarised by the equations (in this example both the equations were simple linear mappings). This is effectively a linear convolution model, mapping one cause to four outputs, which form the inputs to the recognition model (solid arrow). The inputs to the four data or sensory channels are also shown as an image in the insert.
Figure 6
Figure 6. The predictions and conditional densities on the states and parameters of the linear convolution model of the previous figure.
Each row corresponds to a level, with causes on the left and hidden states on the right. In this case, the model has just two levels. The first (upper left) panel shows the predicted response and the error on this response (their sum corresponds to the observed data). For the hidden states (upper right) and causes (lower left) the conditional mode is depicted by a coloured line and the 90% conditional confidence intervals by the grey area. These are sometimes referred to as “tubes”. Finally, the grey lines depict the true values used to generate the response. Here, we estimated the hyperparameters, parameters and the states. This is an example of triple estimation, where we are trying to infer the states of the system as well as the parameters governing its causal architecture. The hyperparameters correspond to the precision of random fluctuations in the response and the hidden states. The free parameters correspond to a single parameter from the state equation and one from the observer equation that govern the dynamics of the hidden states and response, respectively. It can be seen that the true value of the causal state lies within the 90% confidence interval and that we could infer with substantial confidence that the cause was non-zero, when it occurs. Similarly, the true parameter values lie within fairly tight confidence intervals (red bars in the lower right).
Figure 7
Figure 7. Ontology of models starting with a simple general linear model with two levels (the PCA model).
This ontology is one of many that could be constructed and is based on the fact that hierarchical dynamic models have several attributes that can be combined to create an infinite number of models; some of which are shown in the figure. These attributes include; (i) the number of levels or depth; (ii) for each level, linear or nonlinear output functions; (iii) with or without random fluctuations; (iii) static or dynamic (iv), for dynamic levels, linear or nonlinear equations of motion; (v) with or without state noise and, finally, (vi) with or without generalised coordinates.
Figure 8
Figure 8. Schematic detailing the neuronal architectures that encode an ensemble density on the states and parameters of one level in a hierarchical model.
This schematic shows the speculative cells of origin of forward driving connections that convey prediction error from a lower area to a higher area and the backward connections that are used to construct predictions. These predictions try to explain away input from lower areas by suppressing prediction error. In this scheme, the sources of forward connections are the superficial pyramidal cell population and the sources of backward connections are the deep pyramidal cell population. The differential equations relate to the optimisation scheme detailed in the main text and their constituent terms are placed alongside the corresponding connections. The state-units and their efferents are in black and the error-units in red, with causes on the left and hidden states on the right. For simplicity, we have assumed the output of each level is a function of, and only of, the hidden states. This induces a hierarchy over levels and, within each level, a hierarchical relationship between states, where hidden states predict causes.
Figure 9
Figure 9. Schematic detailing the neuronal architectures that encode an ensemble density on the states and parameters of hierarchical models.
This schematic shows how the neuronal populations of the previous figure may be deployed hierarchically within three cortical areas (or macro-columns). Within each area the cells are shown in relation to the laminar structure of the cortex that includes supra-granular (SG) granular (L4) and infra-granular (IG) layers.
Figure 10
Figure 10. The ensemble density and its mean-field partition.
q(ϑ) is the ensemble density and is encoded in terms of the sufficient statistics of its marginals. These statistics or variational parameters (e.g., mean or expectation) change to extremise free-energy to render the ensemble density an approximate conditional density on the causes of sensory input. The mean-field partition corresponds to a factorization over the sets comprising the partition. Here, we have used three sets (neural activity, modulation and connectivity). Critically, the optimisation of the parameters of any one set depends on the parameters of the other sets. In this figure, we have focused on means or expectations µi of the marginal densities, q(ϑi) = N(ϑi: µi,Ci).

Similar articles

See all similar articles

Cited by 172 articles

See all "Cited by" articles


    1. Friston KJ. Variational filtering. Neuroimage. 2008;41(3):747–766. - PubMed
    1. Friston KJ, Trujillo-Barreto N, Daunizeau J. DEM: a variational treatment of dynamic systems. Neuroimage. 2008;41(3):849–885. - PubMed
    1. Friston KJ. Learning and inference in the brain. Neural Netw. 2003;16:1325–1352. - PubMed
    1. Friston KJ. A theory of cortical responses. Philos Trans R Soc Lond B Biol Sci. 2005;360:815–836. - PMC - PubMed
    1. Friston K, Kilner J, Harrison L. A free energy principle for the brain. J Physiol Paris. 2006;100(1–3):70–87. - PubMed

Publication types