Category learning through multimodality sensing

V R de Sa

doi:10.1162/089976698300017368

Category learning through multimodality sensing

Neural Comput. 1998 Jul 1;10(5):1097-117. doi: 10.1162/089976698300017368.

Author

V R de Sa¹

Affiliation

¹ Sloan Center for Theoretical Neurobiology, University of California, San Francisco 94143-0444, USA.

PMID: 9654768
DOI: 10.1162/089976698300017368

Abstract

Humans and other animals learn to form complex categories without receiving a target output, or teaching signal, with each input pattern. In contrast, most computer algorithms that emulate such performance assume the brain is provided with the correct output at the neuronal level or require grossly unphysiological methods of information propagation. Natural environments do not contain explicit labeling signals, but they do contain important information in the form of temporal correlations between sensations to different sensory modalities, and humans are affected by this correlational structure (Howells, 1944; McGurk & MacDonald, 1976; MacDonald & McGurk, 1978; Zellner & Kautz, 1990; Durgin & Proffitt, 1996). In this article we describe a simple, unsupervised neural network algorithm that also uses this natural structure. Using only the co-occurring patterns of lip motion and sound signals from a human speaker, the network learns separate visual and auditory speech classifiers that perform comparably to supervised networks.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Algorithms
Artificial Intelligence*
Humans
Learning / physiology*
Lip / physiology
Models, Neurological
Movement / physiology
Neural Networks, Computer*
Speech / physiology
Speech Perception / physiology