A Comparison of Global and Saturated Probabilistic Approximations Using Characteristic Sets in Mining Incomplete Data

Clark, Patrick G.; Grzymala-Busse, Jerzy W.; Mroczek, Teresa; Niemiec, Rafal

Home // INTELLI 2019, The Eighth International Conference on Intelligent Systems and Applications // View article

A Comparison of Global and Saturated Probabilistic Approximations Using Characteristic Sets in Mining Incomplete Data

Authors:
Patrick G. Clark
Jerzy W. Grzymala-Busse
Teresa Mroczek
Rafal Niemiec

Keywords: Data mining; rough set theory; probabilistic approximations; MLEM2 rule induction algorithm; lost values; "do not care" conditions

Abstract:
Data mining systems form granules of information from data sets. Methods used to construct these granules can significantly impact the overall accuracy of the resulting model. In this paper, we study incomplete data sets with two interpretations of missing attribute values, lost values and "do not care" conditions, to determine the best method between two approaches and achieve the highest accuracy. For such incomplete data sets, we apply data mining based on two probabilistic approximations, global and saturated. The main objective of our paper is to compare both approaches in terms of an error rate, evaluated by ten-fold cross validation. Saturated probabilistic approximations are closer to the concept than global probabilistic approximations, so the corresponding error rate should be smaller. Using a 5% level of significance, our main result shows that there are differences between both approaches. However, in general neither is better for all data sets and thus, both approaches should be tried for each data set with the best selected for rule induction.

Pages: 10 to 15

Copyright: Copyright (c) IARIA, 2019

Publication date: June 30, 2019

Published in: conference

ISSN: 2308-4065

ISBN: 978-1-61208-723-8

Location: Rome, Italy

Dates: from June 30, 2019 to July 4, 2019