Complexity of Rule Sets Induced from Incomplete Data with Lost Values and Attribute-concept Values

Clark, Patrick G.; Grzymala-Busse, Jerzy W.

Home // INTELLI 2014, The Third International Conference on Intelligent Systems and Applications // View article

Complexity of Rule Sets Induced from Incomplete Data with Lost Values and Attribute-concept Values

Authors:
Patrick G. Clark
Jerzy W. Grzymala-Busse

Keywords: Data mining; rough set theory; probabilistic approximations; MLEM2 rule induction algorithm; lost values; attribute-concept values

Abstract:
This paper presents novel research on complexity of rule sets induced from incomplete data sets with two interpretations of missing attribute values: lost values and attribute-concept values. Experiments were conducted on 176 data sets, using three kinds of probabilistic approximations (lower, middle and upper) and the Modified Learning from Examples Module, version 2 (MLEM2) rule induction system. In our experiments, the size of the rule set was always smaller for attribute-concept values than for lost values (5% significance level). The total number of conditions was smaller for attribute-concept values than for lost values for 17 combinations of the type of data set and approximation, out of 24 combinations total. In remaining 7 cases, the difference in performance was statistically insignificant. Thus, we may claim that attribute-concept values are better than lost values in terms of rule complexity.

Pages: 91 to 96

Copyright: Copyright (c) IARIA, 2014

Publication date: June 22, 2014

Published in: conference

ISSN: 2308-4065

ISBN: 978-1-61208-352-0

Location: Seville, Spain

Dates: from June 22, 2014 to June 26, 2014