Home // ACHI 2023, The Sixteenth International Conference on Advances in Computer-Human Interactions // View article


How Should We Define Voice Naturalness

Authors:
Sajad Shirali-Shahreza

Keywords: Text-to-Speech (TTS); Naturalness; Evaluation

Abstract:
Naturalness is a commonly used criteria in Text-To-Speech (TTS) evaluations. The goal is to measure how close generated voice is to real human voice. This is measured through listening tests by human participants. However, no definition for naturalness is provided to participants. In this paper, we aimed to identify what definition participants used when they rank the naturalness. We conducted a user study similar to TTS evaluations and analyzed their responses. We noticed that users have different and sometimes contradictory definitions about it and a major dimension for them was how close it sounds to a real human. Our results show that we should explicitly define the naturalness for the participants. Furthermore, we should ask separate questions for different dimensions of naturalness such as clarity and having accent.

Pages: 235 to 239

Copyright: Copyright (c) IARIA, 2023

Publication date: April 24, 2023

Published in: conference

ISSN: 2308-4138

ISBN: 978-1-68558-078-0

Location: Venice, Italy

Dates: from April 24, 2023 to April 28, 2023