Home // SEMAPRO 2020, The Fourteenth International Conference on Advances in Semantic Processing // View article
Employing Bert Embeddings for Customer Segmentation and Translation Matching
Authors:
Tim vor der Brück
Keywords: Bert embeddings; Targeted Marketing; Translation Matching
Abstract:
In this work, we investigate the performance of Bert (Bidirectional Encoder Representations from Transformers) embeddings for two NLP (natural language processing) scenarios based on semantic similarity and conduct a comparison with ordinary Word2Vec embeddings. The Bert embeddings are pretrained on a multi-lingual dataset from Google consisting of several Wikipedias. The semantic similarity between two input texts is estimated in the usual way of applying the cosine measure on the two embeddings centroids. In case of Bert, these centroids are determined by two different approaches. In the first approach, we just average the embeddings of all the word vectors of the associated sentence. In the second approach, we only average the embeddings of a special sentence start token that contains the whole sentence representation. Surprisingly, the performance of ordinary Word2Vec embeddings turned out to be considerably superior in both scenarios and both calculation methods.
Pages: 21 to 23
Copyright: Copyright (c) IARIA, 2020
Publication date: October 25, 2020
Published in: conference
ISSN: 2308-4510
ISBN: 978-1-61208-813-6
Location: Nice, France
Dates: from October 25, 2020 to October 29, 2020