Home // PATTERNS 2018, The Tenth International Conference on Pervasive Patterns and Applications // View article


Improving Speech Emotion Recognition Based on ToBI Phonological Representation

Authors:
Lingjie Shen
Wei Wang

Keywords: speech emotion recognition; acoustic features; phonology; deep learning

Abstract:
The improvement of Speech Emotion Recognition (SER) relies on the classifiers and features. In terms of feature selection, so far, most of the research only uses a large set of acoustic features which cannot shed lights on the relationship between emotion and phonology. In our study, we improve SER by combining acoustic features and phonological representations together. We improve the SER on the public IEMOCAP database by combing acoustic and phonological features together under leave-one-speaker-out cross validation framework. Support vector machine, logistic regression, multi-layer perceptron and deep learning method of convolutional neural network (CNN) are used in our experiment. With phonological representations, CNN provides 60.22% of unweighted average recall (UAR) on categorical emotion recognition on utterance level which is now the state-of-the-art. When compared to the conventional baseline system based only on acoustic features, the proposed system with combing features gets 7.15% improvement of UAR in four basic emotion classification.

Pages: 1 to 5

Copyright: Copyright (c) IARIA, 2018

Publication date: February 18, 2018

Published in: conference

ISSN: 2308-3557

ISBN: 978-1-61208-612-5

Location: Barcelona, Spain

Dates: from February 18, 2018 to February 22, 2018