On the Number of Conditions in Mining Incomplete Data Using Characteristic Sets and Maximal Consistent Blocks

Clark, Patrick G.; Gao, Cheng; Grzymala-Busse, Jerzy W.; Mroczek, Teresa

Home // ALLDATA 2018, The Fourth International Conference on Big Data, Small Data, Linked Data and Open Data // View article

On the Number of Conditions in Mining Incomplete Data Using Characteristic Sets and Maximal Consistent Blocks

Authors:
Patrick G. Clark
Cheng Gao
Jerzy W. Grzymala-Busse
Teresa Mroczek

Keywords: Tada mining; rough set theory; probabilistic approximations; MLEM2 rule induction algorithm

Abstract:
In this paper, we discuss incomplete data sets with two interpretations of missing attribute values, lost values and "do not care" conditions. For such incomplete data sets, we apply data mining based on characteristic sets and maximal consistent blocks. Our previous research shows that an error rate, evaluated by ten-fold cross validation, is sometimes smaller for characteristic sets and sometimes smaller for maximal consistent blocks. Therefore, we are taking the next step, comparing the quality of both approaches to mining incomplete data in terms of complexity of induced rule sets. We show that for data sets with lost values differences are insignificant while for data sets with "do not care" conditions rule sets are the simplest for upper approximations based on characteristic sets or maximal consistent blocks.

Pages: 84 to 89

Copyright: Copyright (c) IARIA, 2018

Publication date: April 22, 2018

Published in: conference

ISSN: 2519-8386

ISBN: 978-1-61208-631-6

Location: Athens, Greece

Dates: from April 22, 2018 to April 26, 2018