Home // CLOUD COMPUTING 2017, The Eighth International Conference on Cloud Computing, GRIDs, and Virtualization // View article
Closest-Pairs Query Processing in Apache Spark
Authors:
George Mavrommatis
Panagiotis Moutafis
Michael Vassilakopoulos
Keywords: Closest-Pairs Query; Spatial Query Processing; Apache Spark
Abstract:
Processing of spatial queries when the datasets involved are big can be accomplished efficiently in a parallel and distributed environment. The (K) Closest-Pair(s) Query, KCPQ, is a common query in many real-life applications involving geographical, or, in general, spatial data. It consists in finding the (K) closest pair(s) of objects between two spatial datasets. Although, processing of this query has been studied extensively for centralized environments, few solutions have appeared for parallel and distributed frameworks. Apache Spark is such a framework that has several advantages compared to other popular ones, like Hadoop MapReduce. In this work, we present an algorithm for processing the KCPQ in Apache Spark and experimentally study its efficiency and scalability, using big real-world datasets.
Pages: 26 to 31
Copyright: Copyright (c) IARIA, 2017
Publication date: February 19, 2017
Published in: conference
ISSN: 2308-4294
ISBN: 978-1-61208-529-6
Location: Athens, Greece
Dates: from February 19, 2017 to February 23, 2017