Home // ALLDATA 2016, The Second International Conference on Big Data, Small Data, Linked Data and Open Data // View article
AScale: Simple and Fast ETL+Q Scaling for Small and Big Data
Authors:
Pedro Martins
Maryam Abbasi
Pedro Furtado
Keywords: Algorithms; architecture; Scalability; ETL; freshness; high-rate; performance; scale; parallel processing
Abstract:
In this paper, we investigate the problem of providing scalability (out and in) to Extraction, Transformation, Load (ETL) and Querying (Q) (ETL+Q) process of data warehouses. In general, data loading, transformation, and integration are heavy tasks that are performed only periodically, instead of row by row. Parallel architectures and mechanisms are able to optimize the ETL process by speeding up each part of the pipeline process as more performance is needed. We propose parallelization solutions for each part of the ETL+Q, which we integrate into a framework, that is, an approach that enables the automatic scalability and freshness of any data warehouse and ETL+Q process. Our results show that the proposed system algorithms can handle scalability to provide the desired processing speed in big-data and small-data scenarios.
Pages: 1 to 6
Copyright: Copyright (c) IARIA, 2016
Publication date: February 21, 2016
Published in: conference
ISSN: 2519-8386
ISBN: 978-1-61208-457-2
Location: Lisbon, Portugal
Dates: from February 21, 2016 to February 25, 2016