Home // SIGNAL 2023, The Eighth International Conference on Advances in Signal, Image and Video Processing // View article
Authors:
Mayurakshi Mukherji
Shreyas Kulkarni
Vivek Kumar
Senthil Raja G
Thiruvengadam Samon
Kingshuk Banerjee
Yuichi Nonaka
Keywords: MFCC; PLP; speech features; pitch; Indian Langauge
Abstract:
This paper presents the experimental results and comparative analysis of Connected Number Speech Recognition (CNR) models trained using four feature combinations: Mel Frequency Cepstral Coefficient (MFCC), MFCC+Pitch, Perceptual Linear Prediction (PLP), and PLP+Pitch. The set of experiments is conducted for five Indian Native Languages- Bengali, Hindi, Tamil, Kannada, and Marathi. We have collected connected number speech datasets for all five languages and have trained speech recognition models. The Kaldi speech recognition toolkit was used to train acoustic model and the SRILM toolkit was used to build an N-gram language model to prepare a speech recognition system. The model performances were compared and analyzed using Word Error Rate (WER) and Sentence Error Rate (SER) as accuracy metrics. Although, above mentioned Indian languages are atonal in nature, our experiments show that adding pitch features along with MFCC features show overall improvements in WER and SER Values for connected number speech recognition. Moreover, all the speech recognition models are trained under identical conditions but show significantly different WER and SER for different languages.
Pages: 56 to 62
Copyright: Copyright (c) IARIA, 2023
Publication date: March 13, 2023
Published in: conference
ISSN: 2519-8432
ISBN: 978-1-68558-057-5
Location: Barcelona, Spain
Dates: from March 13, 2023 to March 17, 2023