Home // PATTERNS 2018, The Tenth International Conference on Pervasive Patterns and Applications // View article
Improving Speech Emotion Recognition Based on ToBI Phonological Representation
Authors:
Lingjie Shen
Wei Wang
Keywords: speech emotion recognition; acoustic features; phonology; deep learning
Abstract:
The improvement of Speech Emotion Recognition (SER) relies on the classifiers and features. In terms of feature selection, so far, most of the research only uses a large set of acoustic features which cannot shed lights on the relationship between emotion and phonology. In our study, we improve SER by combining acoustic features and phonological representations together. We improve the SER on the public IEMOCAP database by combing acoustic and phonological features together under leave-one-speaker-out cross validation framework. Support vector machine, logistic regression, multi-layer perceptron and deep learning method of convolutional neural network (CNN) are used in our experiment. With phonological representations, CNN provides 60.22% of unweighted average recall (UAR) on categorical emotion recognition on utterance level which is now the state-of-the-art. When compared to the conventional baseline system based only on acoustic features, the proposed system with combing features gets 7.15% improvement of UAR in four basic emotion classification.
Pages: 1 to 5
Copyright: Copyright (c) IARIA, 2018
Publication date: February 18, 2018
Published in: conference
ISSN: 2308-3557
ISBN: 978-1-61208-612-5
Location: Barcelona, Spain
Dates: from February 18, 2018 to February 22, 2018