Home // DBKDA 2015, The Seventh International Conference on Advances in Databases, Knowledge, and Data Applications // View article


Accelerating Data Mining on Incomplete Datasets by Bitmaps-based Missing Value Imputation

Authors:
Sameh Shohdy
Yu Su
Gagan Agrawal

Keywords: Missing Values; Bitmap Indexing; Indexing as a Service.

Abstract:
Among all `big data' research issues, the veracity challenge, which refers to the precision and accuracy of the data, has not received as much attention. Traditionally, it has been well known that problems related to data quality, such as incomplete, redundant, inconsistent, and noisy data pose a major challenge to data mining and data analysis. Particularly, we note that existing methods for handling missing values cannot scale to larger datasets. In other words, this particular veracity challenge has been addressed, but not in context of also handling volume (and possibly the velocity) challenge of `big data'. This paper focuses on speeding up the missing values imputation process using the bitmap indexing technique. The research takes two directions: first, the bitmap indexing is used to directly access the required records for the imputation method (i.e., Direct Access Imputation (DAI)). Second, the bitmap indexing technique is used for missing value estimation using the pre-generated bitmap indexing vectors without accessing the dataset itself (i.e., Bitmap-Based Imputation (BBI)). Both approaches have been evaluated using different real and synthetic datasets, and four common imputation algorithms. We show how our bitmap-based methods can accelerate data mining classification of incomplete data while also maintaining precision.

Pages: 167 to 175

Copyright: Copyright (c) IARIA, 2015

Publication date: May 24, 2015

Published in: conference

ISSN: 2308-4332

ISBN: 978-1-61208-408-4

Location: Rome, Italy

Dates: from May 24, 2015 to May 29, 2015