A Comparison of Classification Systems for Rule Sets Induced from Incomplete Data by Probabilistic Approximations

Clark, Patrick G.; Grzymala-Busse, Jerzy W.

Home // ALLDATA 2015, The First International Conference on Big Data, Small Data, Linked Data and Open Data // View article

A Comparison of Classification Systems for Rule Sets Induced from Incomplete Data by Probabilistic Approximations

Authors:
Patrick G. Clark
Jerzy W. Grzymala-Busse

Keywords: Data mining; rough set theory; probabilistic approximations; MLEM2 rule induction algorithm; lost values and ``do not care'' conditions

Abstract:
In this paper, we compare four strategies used in classification systems. A classification system applies a rule set, induced from the training data set in order to classify each testing case as a member of one of the concepts. We assume that both training and testing data sets are incomplete, i.e., some attribute values are missing. In this paper, we discuss two interpretations of missing attribute values: lost values and ``do not care'' conditions. In our experiments rule sets were induced using probabilistic approximations. Our main results are that for lost value data sets the strength only strategy is better than conditional probability without support and that for ``do not care'' data sets the conditional probability with support strategy is better than strength only.

Pages: 46 to 51

Copyright: Copyright (c) IARIA, 2015

Publication date: April 19, 2015

Published in: conference

ISSN: 2519-8386

ISBN: 978-1-61208-445-9

Location: Barcelona, Spain

Dates: from April 19, 2015 to April 24, 2016