Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 4, 515
eCollection

Modeling Language and Cognition With Deep Unsupervised Learning: A Tutorial Overview

Affiliations

Modeling Language and Cognition With Deep Unsupervised Learning: A Tutorial Overview

Marco Zorzi et al. Front Psychol.

Abstract

Deep unsupervised learning in stochastic recurrent neural networks with many layers of hidden units is a recent breakthrough in neural computation research. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. In this article we discuss the theoretical foundations of this approach and we review key issues related to training, testing and analysis of deep networks for modeling language and cognitive processing. The classic letter and word perception problem of McClelland and Rumelhart (1981) is used as a tutorial example to illustrate how structured and abstract representations may emerge from deep generative learning. We argue that the focus on deep architectures and generative (rather than discriminative) learning represents a crucial step forward for the connectionist modeling enterprise, because it offers a more plausible model of cortical learning as well as a way to bridge the gap between emergentist connectionist models and structured Bayesian models of cognition.

Keywords: connectionist modeling; deep learning; hierarchical generative models; neural networks; unsupervised learning; visual word recognition.

Figures

Figure 1
Figure 1
(A) A directed graphical model, also known as Bayesian network. (B) An undirected graphical model, also known as Markov network. In both graphs, the dashed line highlights the Markov blanket of the blue node.
Figure 2
Figure 2
Graphical representation of a Restricted Boltzmann Machine. The dashed line highlights the Markov blanket of the blue hidden unit, which corresponds to the whole layer of visible units.
Figure 3
Figure 3
(A) Architecture of the DBN with three hidden layers used in the MNIST handwritten digit recognition problem (Hinton and Salakhutdinov, 2006). (B) A typical transfer learning scenario, on which high-level, abstract representations are first extracted by deep unsupervised learning and then used to perform a variety of supervised tasks [adapted from Bengio et al. (2012)]. (C) Reconstructions of MNIST digit images made by the deep network.
Figure 4
Figure 4
Architecture of the DBN with two hidden layers used in the written word perception problem.
Figure 5
Figure 5
Mean accuracy of the linear classifier on the task of recognizing each letter of a word (left) and the whole word (right) as a function of noise level applied to the raw images. Accuracy is averaged over 20 random noise injections and it is computed over the entire dataset of words. Error bars represent SEM. The results are shown for read-out from the two hidden layers of a deep network (DBN), a shallow network (RBM), and raw images.
Figure 6
Figure 6
Visualization of features learned at different hidden layers (Hi). Each square within a layer represents the receptive field of one hidden unit. Excitatory connections are shown in white, whereas inhibitory connections are in black. (A) H1 and H2 on single letters (pixelated “Siple font”). (B) H1, H2 and H3 on MNIST. (C) Sparse H1 and H2 on single letters. (D) Sparse H3 on MNIST. From left to right: H1 on single letters (pixelated “Siple font”); H2 on single letters; H1 on MNIST; H2 on MNIST; H3 on MNIST; sparse H1 on single letters; sparse H2 on single letters; sparse H3 on MNIST.
Figure 7
Figure 7
Inference in the word perception DBN when the word image “WORK” is presented as input under different types of noise. From top to bottom: Gaussian noise, binary noise (30%), binary noise, (50%), occlusion noise. The final state of the visible units, identical across the four noise conditions, is shown on the right.
Figure 8
Figure 8
Illustration of the prototype generation methods in the handwritten digit recognition model. (A) The RBM involving the third hidden layer is jointly trained on the internal representation of the second hidden layer and an additional set of units representing the digit classes (Hinton et al., 2006). (B) Our linear projection method: class label units are only added after the complete DBN training and are associated to the third hidden layer representations by means of a linear mapping. (C) Digit prototypes generated using the linear projection method.

Similar articles

See all similar articles

Cited by 17 PubMed Central articles

See all "Cited by" articles

References

    1. Ackley D., Hinton G. E., Sejnowski T. J. (1985). A learning algorithm for boltzmann machines. Cogn. Sci. 9, 147–169 10.1207/s15516709cog0901_7 - DOI
    1. Albert A. (1972). Regression and the Moore-Penrose pseudoinverse. New York, NY: Academic Press
    1. Andrieu C., De Freitas N., Doucet A., Jordan M. I., Freitas N., De (2003). An introduction to MCMC for machine learning. Mach. Learn. 50, 5–43 10.1023/A:1020281327116 - DOI
    1. Baldi P. (2012). Autoencoders, unsupervised learning, and deep architectures. J. Mach. Learn. Res. 27, 37–50
    1. Bengio Y. (2009). Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 10.1561/2200000006 - DOI

LinkOut - more resources

Feedback