Home // International Journal On Advances in Intelligent Systems, volume 14, numbers 1 and 2, 2021 // View article


Hybrid Knowledge-based and Data-driven Text Similarity Estimation based on Fuzzy Sets, Word Embeddings, and the OdeNet Ontology

Authors:
Tim vor der Brück
Michael Kaufmann

Keywords: OdeNet; fuzzy sets; targeted marketing; histogram equalization

Abstract:
Estimating the semantic similarity between texts is important for a wide range of application scenarios in natural language processing. With the increasing availability of large text corpora, data-driven approaches such as Word2Vec have become quite successful. In contrast, semantic methods, that employ manually designed knowledge bases such as ontologies have lost some of their former popularity. However, manually designed expert knowledge can still be a valuable resource, since it can be leveraged to boost the performance of data-driven approaches. In this paper, we introduce a novel hybrid similarity estimate based on fuzzy sets that exploits both word embeddings and a lexical ontology. As ontology, we use OdeNet, a freely available resource developed by the Darmstadt University of Applied Sciences. Our application scenario is targeted marketing, in which we aim to match people to the best fitting marketing target group based on short German text snippets. The evaluation showed that the use of an ontology did indeed improve the overall result in comparison with a baseline data-driven estimate.

Pages: 114 to 120

Copyright: Copyright (c) to authors, 2021. Used with permission.

Publication date: December 31, 2021

Published in: journal

ISSN: 1942-2679