Towards an intelligent framework for multimodal affective data analysis

Soujanya Poria; Erik Cambria; Amir Hussain; Guang-Bin Huang

doi:10.1016/j.neunet.2014.10.005

Towards an intelligent framework for multimodal affective data analysis

Neural Netw. 2015 Mar:63:104-16. doi: 10.1016/j.neunet.2014.10.005. Epub 2014 Nov 6.

Authors

Soujanya Poria¹, Erik Cambria², Amir Hussain³, Guang-Bin Huang⁴

Affiliations

¹ School of Natural Sciences, University of Stirling, UK. Electronic address: soujanya.poria@cs.stir.ac.uk.
² School of Computer Engineering, Nanyang Technological University, Singapore. Electronic address: cambria@ntu.edu.sg.
³ School of Natural Sciences, University of Stirling, UK. Electronic address: ahu@cs.stir.ac.uk.
⁴ School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore. Electronic address: egbhuang@ntu.edu.sg.

PMID: 25523041
DOI: 10.1016/j.neunet.2014.10.005

Abstract

An increasingly large amount of multimodal content is posted on social media websites such as YouTube and Facebook everyday. In order to cope with the growth of such so much multimodal data, there is an urgent need to develop an intelligent multi-modal analysis framework that can effectively extract information from multiple modalities. In this paper, we propose a novel multimodal information extraction agent, which infers and aggregates the semantic and affective information associated with user-generated multimodal data in contexts such as e-learning, e-health, automatic video content tagging and human-computer interaction. In particular, the developed intelligent agent adopts an ensemble feature extraction approach by exploiting the joint use of tri-modal (text, audio and video) features to enhance the multimodal information extraction process. In preliminary experiments using the eNTERFACE dataset, our proposed multi-modal system is shown to achieve an accuracy of 87.95%, outperforming the best state-of-the-art system by more than 10%, or in relative terms, a 56% reduction in error rate.

Keywords: Affective computing; Emotion analysis; Facial expressions; Multimodal; Multimodal sentiment analysis; Speech; Text.

MeSH terms

Algorithms*
Artificial Intelligence*
Biometric Identification / methods*
Humans
Information Storage and Retrieval / methods*