Home // ALLDATA 2015, The First International Conference on Big Data, Small Data, Linked Data and Open Data // View article
Authors:
Patrick G. Clark
Jerzy W. Grzymala-Busse
Keywords: Data mining; rough set theory; probabilistic approximations; MLEM2 rule induction algorithm; lost values and ``do not care'' conditions
Abstract:
In this paper, we compare four strategies used in classification systems. A classification system applies a rule set, induced from the training data set in order to classify each testing case as a member of one of the concepts. We assume that both training and testing data sets are incomplete, i.e., some attribute values are missing. In this paper, we discuss two interpretations of missing attribute values: lost values and ``do not care'' conditions. In our experiments rule sets were induced using probabilistic approximations. Our main results are that for lost value data sets the strength only strategy is better than conditional probability without support and that for ``do not care'' data sets the conditional probability with support strategy is better than strength only.
Pages: 46 to 51
Copyright: Copyright (c) IARIA, 2015
Publication date: April 19, 2015
Published in: conference
ISSN: 2519-8386
ISBN: 978-1-61208-445-9
Location: Barcelona, Spain
Dates: from April 19, 2015 to April 24, 2016