Home // PATTERNS 2022, The Fourteenth International Conference on Pervasive Patterns and Applications // View article
Tackling the ''We have no Data`` Challenge: Domain-Specific Machine Translation in SMEs
Authors:
Frederik S. Bäumer
Sergej Denisov
Bastian Sirvend
Jens Weber
Keywords: machine learning, machine translation
Abstract:
The use of translation software has decisive advantages for companies. For example, they facilitate communication and the editing and creation of multilingual documents. In contrast to the services of a translation agency, the results are immediately available and can be adapted flexibly. Nevertheless, concerns exist, especially regarding translation quality in case of specialized vocabulary, industry-specific phrases, and data security. Developing and deploying self-hosted business-specific translation models can address both problems by increasing speed and providing company-specific translations. However, this often leads to a situation where companies assume that they cannot contribute the necessary training data. In fact, many companies are sitting on a veritable treasure of data that needs to be lifted. This paper intends to show how we support enterprises with processes and software tools to create datasets for their translation solutions. For this purpose, we apply data acquisition techniques and data preparation methods, sentence alignment, and human-in-the-loop tools.
Pages: 1 to 4
Copyright: Copyright (c) IARIA, 2022
Publication date: April 24, 2022
Published in: conference
ISSN: 2308-3557
ISBN: 978-1-61208-953-9
Location: Barcelona, Spain
Dates: from April 24, 2022 to April 28, 2022