Home // CLOUD COMPUTING 2013, The Fourth International Conference on Cloud Computing, GRIDs, and Virtualization // View article
Using MapReduce to Speed Up Storm Identification from Big Raw Rainfall Data
Authors:
Kulsawasd Jitkajornwanich
Upa Gupta
Ramez Elmasri
Leonidas Fegaras
John McEnery
Keywords: storm analysis; rainfall; big data; MapReduce; distributed computing; CUAHSI
Abstract:
This paper describes an efficient MapReduce algorithm for converting raw rainfall data into meaningful storm information, which can then be easily analyzed and mined. Our previous work proposed a method to identify relevant storm characteristics from raw rainfall data. The original storm identification system takes too long to produce the summarized storm characteristics, because: (1) the raw rainfall data, which is considered as big data, is stored in a traditional relational database based on CUAHSI (Consortium of Universities for the Advancement of Hydrologic Science, Inc.) ODM (Observations Data Model), which leads to substantial disk I/O; (2) the storm identification algorithm is based on recursion and regular depth-first-search (DFS), which leads to multiple retrievals for parts of the data. In this paper, we obtain a substantial improvement in performance by utilizing MapReduce. We also utilize the original raw rainfall data text files instead of using the data in the relational database. In our experiments, the performance of the new storm identification system is significantly improved compared to the previous one. With this new system, it will dramatically benefit hydrologists in helping them performing rainfall-related analysis (both location-specific and storm-specific) such as flood prediction using our identified storms.
Pages: 49 to 55
Copyright: Copyright (c) IARIA, 2013
Publication date: May 27, 2013
Published in: conference
ISSN: 2308-4294
ISBN: 978-1-61208-271-4
Location: Valencia, Spain
Dates: from May 27, 2013 to June 1, 2013