Home // SIGNAL 2023, The Eighth International Conference on Advances in Signal, Image and Video Processing // View article


Comparison of Different Speech Features for Connected Number Recognition of Indian Vernacular Languages

Authors:
Mayurakshi Mukherji
Shreyas Kulkarni
Vivek Kumar
Senthil Raja G
Thiruvengadam Samon
Kingshuk Banerjee
Yuichi Nonaka

Keywords: MFCC; PLP; speech features; pitch; Indian Langauge

Abstract:
This paper presents the experimental results and comparative analysis of Connected Number Speech Recognition (CNR) models trained using four feature combinations: Mel Frequency Cepstral Coefficient (MFCC), MFCC+Pitch, Perceptual Linear Prediction (PLP), and PLP+Pitch. The set of experiments is conducted for five Indian Native Languages- Bengali, Hindi, Tamil, Kannada, and Marathi. We have collected connected number speech datasets for all five languages and have trained speech recognition models. The Kaldi speech recognition toolkit was used to train acoustic model and the SRILM toolkit was used to build an N-gram language model to prepare a speech recognition system. The model performances were compared and analyzed using Word Error Rate (WER) and Sentence Error Rate (SER) as accuracy metrics. Although, above mentioned Indian languages are atonal in nature, our experiments show that adding pitch features along with MFCC features show overall improvements in WER and SER Values for connected number speech recognition. Moreover, all the speech recognition models are trained under identical conditions but show significantly different WER and SER for different languages.

Pages: 56 to 62

Copyright: Copyright (c) IARIA, 2023

Publication date: March 13, 2023

Published in: conference

ISSN: 2519-8432

ISBN: 978-1-68558-057-5

Location: Barcelona, Spain

Dates: from March 13, 2023 to March 17, 2023