How an Optimized DB-SCAN Implementation Reduces Execution Time and Memory Requirements for Large Data Sets

Ekseth, Ole Kristian; Hvasshovd, Svein-Olaf

Home // PATTERNS 2018, The Tenth International Conference on Pervasive Patterns and Applications // View article

How an Optimized DB-SCAN Implementation Reduces Execution Time and Memory Requirements for Large Data Sets

Authors:
Ole Kristian Ekseth
Svein-Olaf Hvasshovd

Keywords: Clustering, similarity metrics, data analysis, per- formance.

Abstract:
In data-analysis the use of approximate cluster algorithms has received broad popularity. A popular cluster- algorithm is the DBSCAN cluster-algorithm. While a number of software libraries provide support for the latter, they provide poor performance when analysing high-dimensional data. In this work we address this issue. We present a novel method and implementation which significantly boosts the performance of DBSCAN. The result is a software which reduce the memory- consumption by 103 GB for large data-sets while reducing the execution-time by 600x+ (for important similarity-metrics). This artilce presents a high-performance appraoch to identify answers to region-based similarity queries. While our work is tuned towards the application of DBSCAN, our novel approach for high-performance filtering of pairwise similarity-scores may be used in a number of cluster-algorithms. Therefore, the proposed method and software manages to address issues which are known to hamper high-dimensional data-analysis.

Pages: 6 to 11

Copyright: Copyright (c) IARIA, 2018

Publication date: February 18, 2018

Published in: conference

ISSN: 2308-3557

ISBN: 978-1-61208-612-5

Location: Barcelona, Spain

Dates: from February 18, 2018 to February 22, 2018