Home // DBKDA 2025, The Seventeenth International Conference on Advances in Databases, Knowledge, and Data Applications // View article


Towards Extracting Entity Relationship Diagrams from Unstructured Text using Natural Language Processing

Authors:
Vaihunthan Vyramuthu
Gregor Grambow

Keywords: Entity-Relationship Model, Natural Language Processing, Named Entity Recognition, POS-Tagging, SpaCy, LSTM

Abstract:
In computer science, the creation of applications usually involves the process of abstracting real world entities and relationships and creating models to be able to process these. One crucial part of this is data storage and management and therefore the creation of data models. As a first step, usually the Entity-Relationship (ER) model is used. However, the transformation from real world descriptions in natural language to standardized ER diagrams can be tedious and error-prone. Recently, Natural Language Processing (NLP) has gained much attention but this specific area is still mostly handled manually by humans. This paper describes a hybrid system for capturing ER model components from German texts using NLP. That way, time-consuming interpretation of textual database scenarios can be automated. We implemented and tested both rule-based and model-based approaches, whereas the main extraction is performed by the rule-based variant so that the entities, attributes, relationships and cardinalities can be strategically identified. The results of the model-based approach are used as a comparison to the rule-based results and can be applied for correctness checking and improvement of the results. Furthermore, we conducted a preliminary evaluation, which shows promising results. A hybrid approach can be better than a classical approach, as it combines the precision of the rule-based system with the flexibility of the model-based approach. This may lead to a more robust and reliable extraction, as errors in one of the approaches can be compensated by the other.

Pages: 42 to 47

Copyright: Copyright (c) IARIA, 2025

Publication date: March 9, 2025

Published in: conference

ISSN: 2308-4332

ISBN: 978-1-68558-244-9

Location: Lisbon, Portugal

Dates: from March 9, 2025 to March 13, 2025