Efficient ETL+Q for Automatic Scalability in Big or Small Data Scenarios

Martins, Pedro; Abbasi, Maryam; Furtado, Pedro

Home // ICSEA 2015, The Tenth International Conference on Software Engineering Advances // View article

Efficient ETL+Q for Automatic Scalability in Big or Small Data Scenarios

Authors:
Pedro Martins
Maryam Abbasi
Pedro Furtado

Keywords: Algorithms; architecture; Scalability; ETL; freshness; high-rate; performance; scale; parallel processing

Abstract:
In this paper, we investigate the problem of providing scalability to data Extraction, Transformation, Load and Querying (ETL+Q) process of data warehouses. In general, data loading, transformation and integration are heavy tasks that are performed only periodically. Parallel architectures and mechanisms are able to optimize the ETL process by speeding-up each part of the pipeline process as more performance is needed. We propose an approach to enable the automatic scalability and freshness of any data warehouse and ETL+Q process, suitable for smallData and bigData business. A general framework for testing and implementing the system was developed to provide solutions for each part of the ETL+Q automatic scalability. The results show that the proposed system is capable of handling scalability to provide the desired processing speed for both near-real-time results and offline ETL+Q processing.

Pages: 242 to 247

Copyright: Copyright (c) IARIA, 2015

Publication date: November 15, 2015

Published in: conference

ISSN: 2308-4235

ISBN: 978-1-61208-438-1

Location: Barcelona, Spain

Dates: from November 15, 2015 to November 20, 2015