Home // ICAS 2014, The Tenth International Conference on Autonomic and Autonomous Systems // View article
Authors:
Alcilene Dalília de Sousa
Luciano Reis Coutinho
Keywords: fault tolerance; grid computing; opportunistic grids; autonomic computing; reinforcement learning.
Abstract:
Fault tolerance is a longstanding problem. Two basic solutions are replication and checkpointing, both with their pros and cons. In this paper, we put forward an approach to balance replication and checkpointing in order to provide fault tolerance in opportunistic grid computing systems. We try to retain the benefits of both techniques, while avoiding their downsides. The approach combines reinforcement learning with the MAPE-K architecture for autonomic computing. To validate our proposal, we have performed experiments based simulation using the Autonomic Grid Simulator Tool (AGST). We report promising results. We show that the proposed approach is able to learn suitable switching thresholds between checkpointing and replication. The suitability is verified by comparing the average completion time and the success rate of applications of our proposal against the values from other approaches in the literature.
Pages: 11 to 17
Copyright: Copyright (c) IARIA, 2014
Publication date: April 20, 2014
Published in: conference
ISSN: 2308-3913
ISBN: 978-1-61208-331-5
Location: Chamonix, France
Dates: from April 20, 2014 to April 24, 2014