Home // SOTICS 2023, The Thirteenth International Conference on Social Media Technologies, Communication, and Informatics // View article


From Unstructured Data to Digital Twins: From Tweets to Structured Knowledge

Authors:
Sergej Schultenkämper
Frederik Simon Bäumer
Yeong Su Lee
Michaela Geierhos

Keywords: Digital Twin; Data Privacy; Semantic Triple

Abstract:
This paper focuses on extracting relevant information from unstructured data, specifically analyzing text shared by users on Twitter. The goal is to build a comprehensive knowledge graph by extracting implicit personal information from tweets, including interests, activities, events, family, health, relationships, and professional information. The extracted information is used to instantiate a digital twin and develop a personalized alert system to protect users from threats, such as social engineering or doxing. The paper evaluates the effectiveness of state-of-the-art large language models, such as GPT-4, for extracting relevant triples from tweets. The study also explores the notion of digital twins in the context of cyber threats and presents related work in information extraction. The approach includes data collection, multi-label classification, relational triple extraction, and evaluation of the results. The dataset used is from Twitter, and the study analyzes the challenges posed by user-generated data. The results show the accuracy of the extracted triples and the personal characteristics that can be identified from tweets for the development of the Digital Twin. The results contribute to the ADRIAN research project, which focuses on machine learning-based methods for detecting potential threats to people's privacy.

Pages: 6 to 11

Copyright: Copyright (c) IARIA, 2023

Publication date: November 13, 2023

Published in: conference

ISSN: 2326-9294

ISBN: 978-1-68558-103-9

Location: Valencia, Spain

Dates: from November 13, 2023 to November 17, 2023