Home // CLOUD COMPUTING 2013, The Fourth International Conference on Cloud Computing, GRIDs, and Virtualization // View article


Using MapReduce to Speed Up Storm Identification from Big Raw Rainfall Data

Authors:
Kulsawasd Jitkajornwanich
Upa Gupta
Ramez Elmasri
Leonidas Fegaras
John McEnery

Keywords: storm analysis; rainfall; big data; MapReduce; distributed computing; CUAHSI

Abstract:
This paper describes an efficient MapReduce algorithm for converting raw rainfall data into meaningful storm information, which can then be easily analyzed and mined. Our previous work proposed a method to identify relevant storm characteristics from raw rainfall data. The original storm identification system takes too long to produce the summarized storm characteristics, because: (1) the raw rainfall data, which is considered as big data, is stored in a traditional relational database based on CUAHSI (Consortium of Universities for the Advancement of Hydrologic Science, Inc.) ODM (Observations Data Model), which leads to substantial disk I/O; (2) the storm identification algorithm is based on recursion and regular depth-first-search (DFS), which leads to multiple retrievals for parts of the data. In this paper, we obtain a substantial improvement in performance by utilizing MapReduce. We also utilize the original raw rainfall data text files instead of using the data in the relational database. In our experiments, the performance of the new storm identification system is significantly improved compared to the previous one. With this new system, it will dramatically benefit hydrologists in helping them performing rainfall-related analysis (both location-specific and storm-specific) such as flood prediction using our identified storms.

Pages: 49 to 55

Copyright: Copyright (c) IARIA, 2013

Publication date: May 27, 2013

Published in: conference

ISSN: 2308-4294

ISBN: 978-1-61208-271-4

Location: Valencia, Spain

Dates: from May 27, 2013 to June 1, 2013