Home // DBKDA 2013, The Fifth International Conference on Advances in Databases, Knowledge, and Data Applications // View article


Clustering XML Data Streams by Structure based on SlidingWindows and Exponential Histograms

Authors:
Mingxia Gao
Furong Chen

Keywords: XML data stream; temporal cluster feature

Abstract:
To group online XML data streams by structure, this paper introduces an algorithm named the CXDSS-SWEH. It is a dynamic clustering algorithm based on sliding windows and exponential histograms. Firstly, the algorithm formalizes an XML document into a structure synopsis named Temporal Cluster Feature for XML Structure (TCFXS). Secondly, it allots the TCFXS to some cluster through measuring similarities between the TCFXS and each existing cluster. At last, updating clusters in sliding windows are real-time modified through criterions of false positive exponential histograms. We have conducted a series of experiments involving real and simulative XML data streams for validating empirical effects on clustering quality, memory and time consumption. Our experimental results have confirmed: (1) clustering quality of the CXDSSSWEH is close to the methods XCLS and SW-XSCLS; (2) memory and time consumption of the CXDSS-SWEH are efficient and effective, compared to the SW-XSCLS.

Pages: 224 to 230

Copyright: Copyright (c) IARIA, 2013

Publication date: January 27, 2013

Published in: conference

ISSN: 2308-4332

ISBN: 978-1-61208-247-9

Location: Seville, Spain

Dates: from January 27, 2013 to February 1, 2013