Home // DATA ANALYTICS 2020, The Ninth International Conference on Data Analytics // View article


A Comprehensive Study of Recent Metadata Models for Data Lake

Authors:
Redha Benaissa
Omar Boussaid
Aicha Mokhtari
Farid Benhammadi

Keywords: Metadata; Metadata models; Data Lakes; Big Data.

Abstract:
In the era of Big Data, an unrepresented amount of heterogeneous and unstructured data is generated every day, which needs to be stored, managed, and processed to create new services and applications. This has brought new concepts in data management such as Data Lakes (DL) where the raw data is stored without any transformation. Successful DL systems deploy efficient metadata techniques in order to organize the DL. This paper presents a comprehensive study of recent metadata models for Data Lake that points out their rationales, strengths, and weaknesses. More precisely, we provide a layered taxonomy of recent metadata models and their specifications. This is followed by a survey of recent works dealing with metadata management in DL, which can be categorized into level, typology, and content metadata. Based on such a study, an in-depth analysis of key features, strengths, and missing points is conducted. This, in turn, allowed to find the gap in the literature and identify open research issues that require the attention of the community.

Pages: 78 to 83

Copyright: Copyright (c) IARIA, 2020

Publication date: October 25, 2020

Published in: conference

ISSN: 2308-4464

ISBN: 978-1-61208-816-7

Location: Nice, France

Dates: from October 25, 2020 to October 29, 2020