Home // IMMM 2015, The Fifth International Conference on Advances in Information Mining and Management // View article
Real-time Partition of Streamed Graphs for Data Mining Over Large Scale Data
Authors:
Víctor Medel
Unai Arronategui
Keywords: Big Graphs; Data Streaming; Graph Partition; Sampling.
Abstract:
Abstract—Mining data in real-time from large graphs requires a lot of memory to obtain a good distribution of information. Current state of the art solutions for streamed graphs are not scalable and they work with a single stream source. We propose a new reduced memory model to partition large graphs ove big streams to improve mining algorithms. The aim of our work is to give support to data mining algorithms over large-scale structured data (e.g., Web structure, social networks) to minimise communication among partitions. In our architecture, the incoming graph elements are sampled to reduce total memory usage and the information in each partitioner is updated in a feedback scheme to allow multiple entry points. We have made experimentation with real-world graphs and we have discussed about the suitability of different sampling strategies depending on the graph structure. In addition, we have executed the PageRank algorithm over the partitioned graph, in order to measure the influence of the partition in the execution of a mining algorithm.
Pages: 41 to 47
Copyright: Copyright (c) IARIA, 2015
Publication date: June 21, 2015
Published in: conference
ISSN: 2326-9332
ISBN: 978-1-61208-415-2
Location: Brussels, Belgium
Dates: from June 21, 2015 to June 26, 2015