Home // ALLDATA 2018, The Fourth International Conference on Big Data, Small Data, Linked Data and Open Data // View article
Authors:
Patrick G. Clark
Cheng Gao
Jerzy W. Grzymala-Busse
Teresa Mroczek
Keywords: Tada mining; rough set theory; probabilistic approximations; MLEM2 rule induction algorithm
Abstract:
In this paper, we discuss incomplete data sets with two interpretations of missing attribute values, lost values and "do not care" conditions. For such incomplete data sets, we apply data mining based on characteristic sets and maximal consistent blocks. Our previous research shows that an error rate, evaluated by ten-fold cross validation, is sometimes smaller for characteristic sets and sometimes smaller for maximal consistent blocks. Therefore, we are taking the next step, comparing the quality of both approaches to mining incomplete data in terms of complexity of induced rule sets. We show that for data sets with lost values differences are insignificant while for data sets with "do not care" conditions rule sets are the simplest for upper approximations based on characteristic sets or maximal consistent blocks.
Pages: 84 to 89
Copyright: Copyright (c) IARIA, 2018
Publication date: April 22, 2018
Published in: conference
ISSN: 2519-8386
ISBN: 978-1-61208-631-6
Location: Athens, Greece
Dates: from April 22, 2018 to April 26, 2018