Fuzzy Outlier Detection by Applying the ECF-Means Algorithm. A clustering ensemble approach for mining large datasets

Zazzaro, Gaetano; Martone, Angelo

Home // International Journal On Advances in Software, volume 12, numbers 1 and 2, 2019 // View article

Fuzzy Outlier Detection by Applying the ECF-Means Algorithm. A clustering ensemble approach for mining large datasets

Authors:
Gaetano Zazzaro
Angelo Martone

Keywords: ECF-means; Fuzzy Outlier Detection; Data Mining; Ensemble Clustering; k-means; Weka

Abstract:
This paper focuses on how to mine large datasets by applying the ECF-means algorithm, in order to detect potential outliers. ECF-means is a clustering algorithm, which combines different clustering results in ensemble, achieved by different runs of a chosen algorithm, into a single final clustering configuration. Furthermore, ECF is also a manner to “fuzzify” a clustering algorithm, assigning a membership degree to each point for each obtained cluster. A new kind of outlier, called o-rank fuzzy outlier, is also introduced; this element does not strongly belong to any cluster, which needs to be observed more closely; moreover, a novel validation index, called o.FOUI, is defined too, based on this new kind of fuzzy outliers. The proposed method for fuzzification is applied to the k-means clustering algorithm by using its Weka implementation and an ad-hoc developed software application. Through the three exposed case studies, the experimental outcomes on real world datasets, and the comparison with the results of other outlier detection methods, the proposed algorithm seems to provide other types of deeper detections; the first case study concerns the famous Wine dataset from the UCI Machine Learning Repository; the second one involves the analysis and exploration of data in meteorological domain, where various results are explained; finally, the third case study explores the well-known Iris dataset which, traditionally, has no outliers, while new information is discovered by the ECF-means algorithm and exposed here with many results.

Pages: 11 to 29

Publication date: June 30, 2019

Published in: journal

ISSN: 1942-2628