Home // CENTRIC 2021, The Fourteenth International Conference on Advances in Human-oriented and Personalized Mechanisms, Technologies, and Services // View article


Neural Speech Synthesis in German

Authors:
Johannes Wirth
Pascal Puchtler
René Peinl

Keywords: Text-To-Speech; German; Tacotron 2; Multi-Band MelGAN

Abstract:
While many speech synthesis systems based on deep neural networks are thoroughly evaluated and released for free use in English, models for languages with far less active speakers like German are scarcely trained and most often not published for common use. This work covers specific challenges in training text to speech models for the German language, including dataset selection and data preprocessing, and presents the training process for multiple models of an end-to-end text to speech system based on a combination of Tacotron 2 and MultiBand MelGAN. All model compositions were evaluated against the mean opinion score, which revealed comparable results to models in literature that are trained and evaluated on English datasets. In addition, empirical analyses identified distinct aspects influencing the quality of such systems, based on subjective user experience. All trained models are released for public use.

Pages: 26 to 34

Copyright: Copyright (c) IARIA, 2021

Publication date: October 3, 2021

Published in: conference

ISSN: 2308-3492

ISBN: 978-1-61208-896-9

Location: Barcelona, Spain

Dates: from October 3, 2021 to October 7, 2021