A Comparison of Two MLEM2 Rule Induction Algorithms Applied to Data with Many Missing Attribute Values

Clark, Patrick G.; Gao, Cheng; Grzymala-Busse, Jerzy W.

Home // DBKDA 2016, The Eighth International Conference on Advances in Databases, Knowledge, and Data Applications // View article

A Comparison of Two MLEM2 Rule Induction Algorithms Applied to Data with Many Missing Attribute Values

Authors:
Patrick G. Clark
Cheng Gao
Jerzy W. Grzymala-Busse

Keywords: Probabilistic approximations; generalization of probabilistic approximations; concept probabilistic approximations; true MLEM2 algorithm; emulated MLEM2 algorithm

Abstract:
We present results of novel experiments, conducted on 18 data sets with many missing attribute values, interpreted as lost values, attribute-concept values and "do not care" conditions. The main objective was to compare two versions of the Modified Learning from Examples, version 2 (MLEM2) rule induction algorithm, emulated and true, using concept probabilistic approximations. Our secondary objective was to check which interpretation of missing attribute values provides the smallest error rate, computed as a result of ten-fold cross validation. Results of our experiments show that both versions of the MLEM2 rule induction algorithms do not differ much. On the other hand, there is some evidence that the lost value interpretation of missing attribute values is the best: in seven cases this interpretation was significantly better (with 5% of significance level, two-tailed test) than attribute-concept values, and in eight cases it was better than "do not care" conditions. Additionally, attribute-concept values and "do not care" conditions were never significantly better than lost values.

Pages: 60 to 65

Copyright: Copyright (c) IARIA, 2016

Publication date: June 26, 2016

Published in: conference

ISSN: 2308-4332

ISBN: 978-1-61208-486-2

Location: Lisbon, Portugal

Dates: from June 26, 2016 to June 30, 2016