Home // ACHI 2023, The Sixteenth International Conference on Advances in Computer-Human Interactions // View article
How Should We Define Voice Naturalness
Authors:
Sajad Shirali-Shahreza
Keywords: Text-to-Speech (TTS); Naturalness; Evaluation
Abstract:
Naturalness is a commonly used criteria in Text-To-Speech (TTS) evaluations. The goal is to measure how close generated voice is to real human voice. This is measured through listening tests by human participants. However, no definition for naturalness is provided to participants. In this paper, we aimed to identify what definition participants used when they rank the naturalness. We conducted a user study similar to TTS evaluations and analyzed their responses. We noticed that users have different and sometimes contradictory definitions about it and a major dimension for them was how close it sounds to a real human. Our results show that we should explicitly define the naturalness for the participants. Furthermore, we should ask separate questions for different dimensions of naturalness such as clarity and having accent.
Pages: 235 to 239
Copyright: Copyright (c) IARIA, 2023
Publication date: April 24, 2023
Published in: conference
ISSN: 2308-4138
ISBN: 978-1-68558-078-0
Location: Venice, Italy
Dates: from April 24, 2023 to April 28, 2023