Grid Spider: a Framework for Data Intensive Research with Data Process Memoization Cache

Yamada, Daichi; Sonobe, Tomohiro; Tezuka, Hiroshi; Inaba, Mary

Home // INTENSIVE 2012, The Fourth International Conference on Resource Intensive Applications and Services // View article

Grid Spider: a Framework for Data Intensive Research with Data Process Memoization Cache

Authors:
Daichi Yamada
Tomohiro Sonobe
Hiroshi Tezuka
Mary Inaba

Keywords: data intensive; memoization; file cache; cache replacement;

Abstract:
As computational power grow, the new field of "Data Intensive Computation" has emerged in which vast amounts of data generated by radio telescopes, particle accelerators, electron microscopes, genomics and Earth observation equipment is processed. In most cases, once the data has been accumulated, it is not overwritten. It has also been observed that in many cases the very same software is used to pre-process the very same data, leading to identical results. To address these issues, we propose "Grid Spider", a framework for data intensive scientific research which is optimized to avoid re-computation through the utilization of our file cache mechanism called "Data Process Memoization Cache" or DPMCache. This mechanism requires pre-processing applications to maintain referential transparency. Both the data and the application are registered with Grid Spider prior to processing, and for each execution of the application, Grid Spider records the history of the coupling of the application, the input data fie, and the output data file. To evaluate Grid Spider, we have implemented "GEO Grid Spider II", which is a framework within which geo-scientists can evaluate satellite data archives.

Pages: 5 to 8

Copyright: Copyright (c) IARIA, 2012

Publication date: March 25, 2012

Published in: conference

ISBN: 978-1-61208-188-5

Location: St. Maarten, The Netherlands Antilles

Dates: from March 25, 2012 to March 30, 2012