Home // IMMM 2011, The First International Conference on Advances in Information Mining and Management // View article
An Equivalence Class Based Clustering Algorithm for Categorical Data
Authors:
Qingbao liu
Wanjun Wang
Su Deng
Guozhu Dong
Keywords: clustering analysis; categorical data; equivalence class
Abstract:
Most traditional clustering methods rely on a distance function. However, the distance between categorical data is hard to define, especially for exploratory situations where the data is not well understood. As a result, many clustering methods do not perform well on categorical datasets. In this paper we propose a novel Equivalence Class based Clustering Algorithm for Categorical data (ECCC). ECCC takes the support transaction sets of selected frequent closed patterns as the candidate clusters. We define a novel quality measure to evaluate the suitability of frequent closed patterns to form the clusters; the measure is based on two factors: cluster coherence expressed in terms of closed patterns, and cluster discrimination expressed in terms of quality and diversity of minimal generator patterns. ECCC uses that measure to select the high quality frequent closed patterns to form the final clusters.
Pages: 127 to 130
Copyright: Copyright (c) IARIA, 2011
Publication date: October 23, 2011
Published in: conference
ISSN: 2326-9332
ISBN: 978-1-61208-162-5
Location: Barcelona, Spain
Dates: from October 23, 2011 to October 29, 2011