Home // IMMM 2011, The First International Conference on Advances in Information Mining and Management // View article


An Equivalence Class Based Clustering Algorithm for Categorical Data

Authors:
Qingbao liu
Wanjun Wang
Su Deng
Guozhu Dong

Keywords: clustering analysis; categorical data; equivalence class

Abstract:
Most traditional clustering methods rely on a distance function. However, the distance between categorical data is hard to define, especially for exploratory situations where the data is not well understood. As a result, many clustering methods do not perform well on categorical datasets. In this paper we propose a novel Equivalence Class based Clustering Algorithm for Categorical data (ECCC). ECCC takes the support transaction sets of selected frequent closed patterns as the candidate clusters. We define a novel quality measure to evaluate the suitability of frequent closed patterns to form the clusters; the measure is based on two factors: cluster coherence expressed in terms of closed patterns, and cluster discrimination expressed in terms of quality and diversity of minimal generator patterns. ECCC uses that measure to select the high quality frequent closed patterns to form the final clusters.

Pages: 127 to 130

Copyright: Copyright (c) IARIA, 2011

Publication date: October 23, 2011

Published in: conference

ISSN: 2326-9332

ISBN: 978-1-61208-162-5

Location: Barcelona, Spain

Dates: from October 23, 2011 to October 29, 2011