A Fault Tolerance Approach Based on Reinforcement Learning in the Context of Autonomic Opportunistic Grids

de Sousa, Alcilene Dalília; Coutinho, Luciano Reis

Home // ICAS 2014, The Tenth International Conference on Autonomic and Autonomous Systems // View article

A Fault Tolerance Approach Based on Reinforcement Learning in the Context of Autonomic Opportunistic Grids

Authors:
Alcilene Dalília de Sousa
Luciano Reis Coutinho

Keywords: fault tolerance; grid computing; opportunistic grids; autonomic computing; reinforcement learning.

Abstract:
Fault tolerance is a longstanding problem. Two basic solutions are replication and checkpointing, both with their pros and cons. In this paper, we put forward an approach to balance replication and checkpointing in order to provide fault tolerance in opportunistic grid computing systems. We try to retain the benefits of both techniques, while avoiding their downsides. The approach combines reinforcement learning with the MAPE-K architecture for autonomic computing. To validate our proposal, we have performed experiments based simulation using the Autonomic Grid Simulator Tool (AGST). We report promising results. We show that the proposed approach is able to learn suitable switching thresholds between checkpointing and replication. The suitability is verified by comparing the average completion time and the success rate of applications of our proposal against the values from other approaches in the literature.

Pages: 11 to 17

Copyright: Copyright (c) IARIA, 2014

Publication date: April 20, 2014

Published in: conference

ISSN: 2308-3913

ISBN: 978-1-61208-331-5

Location: Chamonix, France

Dates: from April 20, 2014 to April 24, 2014