Mining Incomplete Data with Many Missing Attribute Values A Comparison of Probabilistic and Rough Set Approaches

Clark, Patrick G.; Grzymala-Busse, Jerzy W.; Kuehnhausen, Martin

Home // INTELLI 2013, The Second International Conference on Intelligent Systems and Applications // View article

Mining Incomplete Data with Many Missing Attribute Values A Comparison of Probabilistic and Rough Set Approaches

Authors:
Patrick G. Clark
Jerzy W. Grzymala-Busse
Martin Kuehnhausen

Keywords: Data mining; probabilistic approaches to missing attribute values; rough set theory; probabilistic approximations; parameterized approximations

Abstract:
In this paper, we study probabilistic and rough set approaches to missing attribute values. Probabilistic approaches are based on imputation, a missing attribute value is replaced either by the most probable known attribute value or by the most probable attribute value restricted to a concept. In this paper, in a rough set approach to missing attribute values we consider two interpretations of such value: lost and "do not care". Additionally, we apply three definitions of approximations (singleton, subset and concept) and use an additional parameter called alpha. Our main objective was to compare probabilistic and rough set approaches to missing attribute values for incomplete data sets with many missing attribute values. We conducted experiments on six incomplete data sets with as many missing attribute values as possible. In these data sets an additional incremental replacement of known values by missing attribute values resulted with the entire records filled with only missing attribute values. Rough set approaches were better for five data sets, for one data set probabilistic approach was more successful.

Pages: 12 to 17

Copyright: Copyright (c) IARIA, 2013

Publication date: April 21, 2013

Published in: conference

ISSN: 2308-4065

ISBN: 978-1-61208-269-1

Location: Venice, Italy

Dates: from April 21, 2013 to April 26, 2013