Home // PATTERNS 2022, The Fourteenth International Conference on Pervasive Patterns and Applications // View article


Tackling the ''We have no Data`` Challenge: Domain-Specific Machine Translation in SMEs

Authors:
Frederik S. Bäumer
Sergej Denisov
Bastian Sirvend
Jens Weber

Keywords: machine learning, machine translation

Abstract:
The use of translation software has decisive advantages for companies. For example, they facilitate communication and the editing and creation of multilingual documents. In contrast to the services of a translation agency, the results are immediately available and can be adapted flexibly. Nevertheless, concerns exist, especially regarding translation quality in case of specialized vocabulary, industry-specific phrases, and data security. Developing and deploying self-hosted business-specific translation models can address both problems by increasing speed and providing company-specific translations. However, this often leads to a situation where companies assume that they cannot contribute the necessary training data. In fact, many companies are sitting on a veritable treasure of data that needs to be lifted. This paper intends to show how we support enterprises with processes and software tools to create datasets for their translation solutions. For this purpose, we apply data acquisition techniques and data preparation methods, sentence alignment, and human-in-the-loop tools.

Pages: 1 to 4

Copyright: Copyright (c) IARIA, 2022

Publication date: April 24, 2022

Published in: conference

ISSN: 2308-3557

ISBN: 978-1-61208-953-9

Location: Barcelona, Spain

Dates: from April 24, 2022 to April 28, 2022