Home // SOTICS 2023, The Thirteenth International Conference on Social Media Technologies, Communication, and Informatics // View article
From Unstructured Data to Digital Twins: From Tweets to Structured Knowledge
Authors:
Sergej Schultenkämper
Frederik Simon Bäumer
Yeong Su Lee
Michaela Geierhos
Keywords: Digital Twin; Data Privacy; Semantic Triple
Abstract:
This paper focuses on extracting relevant information from unstructured data, specifically analyzing text shared by users on Twitter. The goal is to build a comprehensive knowledge graph by extracting implicit personal information from tweets, including interests, activities, events, family, health, relationships, and professional information. The extracted information is used to instantiate a digital twin and develop a personalized alert system to protect users from threats, such as social engineering or doxing. The paper evaluates the effectiveness of state-of-the-art large language models, such as GPT-4, for extracting relevant triples from tweets. The study also explores the notion of digital twins in the context of cyber threats and presents related work in information extraction. The approach includes data collection, multi-label classification, relational triple extraction, and evaluation of the results. The dataset used is from Twitter, and the study analyzes the challenges posed by user-generated data. The results show the accuracy of the extracted triples and the personal characteristics that can be identified from tweets for the development of the Digital Twin. The results contribute to the ADRIAN research project, which focuses on machine learning-based methods for detecting potential threats to people's privacy.
Pages: 6 to 11
Copyright: Copyright (c) IARIA, 2023
Publication date: November 13, 2023
Published in: conference
ISSN: 2326-9294
ISBN: 978-1-68558-103-9
Location: Valencia, Spain
Dates: from November 13, 2023 to November 17, 2023