Emotion recognition plays an important role in several applications, such as human computer interaction and understanding affective state of users in certain tasks, e.g., within a learning process, monitoring of elderly, interactive entertainment etc. It may be based upon several modalities, e.g., by analyzing facial expressions and/or speech, using electroencephalograms, electrocardiograms etc. In certain applications the only available modality is the user's (speaker's) voice. In this paper we aim to analyze speakers' emotions based solely on paralinguistic information, i.e., not depending on the linguistic aspect of speech. We compare two machine learning approaches, namely a Convolutional Neural Network and a Support Vector Machine. The former is trained using raw speech information, while the latter is trained on a set of extracted low-level features. Aiming to provide a multilingual approach, training and testing datasets contain speech from different languages.
Keywords: Convolutional neural networks; Emotion recognition; Speech information; Support vector machines; Transfer learning.