Home // DBKDA 2015, The Seventh International Conference on Advances in Databases, Knowledge, and Data Applications // View article
Accelerating Data Mining on Incomplete Datasets by Bitmaps-based Missing Value Imputation
Authors:
Sameh Shohdy
Yu Su
Gagan Agrawal
Keywords: Missing Values; Bitmap Indexing; Indexing as a Service.
Abstract:
Among all `big data' research issues, the veracity challenge, which refers to the precision and accuracy of the data, has not received as much attention. Traditionally, it has been well known that problems related to data quality, such as incomplete, redundant, inconsistent, and noisy data pose a major challenge to data mining and data analysis. Particularly, we note that existing methods for handling missing values cannot scale to larger datasets. In other words, this particular veracity challenge has been addressed, but not in context of also handling volume (and possibly the velocity) challenge of `big data'. This paper focuses on speeding up the missing values imputation process using the bitmap indexing technique. The research takes two directions: first, the bitmap indexing is used to directly access the required records for the imputation method (i.e., Direct Access Imputation (DAI)). Second, the bitmap indexing technique is used for missing value estimation using the pre-generated bitmap indexing vectors without accessing the dataset itself (i.e., Bitmap-Based Imputation (BBI)). Both approaches have been evaluated using different real and synthetic datasets, and four common imputation algorithms. We show how our bitmap-based methods can accelerate data mining classification of incomplete data while also maintaining precision.
Pages: 167 to 175
Copyright: Copyright (c) IARIA, 2015
Publication date: May 24, 2015
Published in: conference
ISSN: 2308-4332
ISBN: 978-1-61208-408-4
Location: Rome, Italy
Dates: from May 24, 2015 to May 29, 2015