Home // ADVCOMP 2012, The Sixth International Conference on Advanced Engineering Computing and Applications in Sciences // View article


RADIC-based Message Passing Fault Tolerance System

Authors:
Marcela Castro
Dolores Rexachs
Emilio Luque

Keywords: Fault-tolerance; High-Availability; RADIC; message passing; socket.

Abstract:
We present an analysis design of how to incorporate a transparent fault tolerance system at socket level for message passing applications. The novel design changes the default socket model avoiding being unexpectedly closed due to a remote node failure. Moreover, a pessimistic log-based rollback recovery protocol added to this level makes possible restarting and re-executing a failed parallel process until the point of failure independently of the rest of the processes. This paper explains and analyzes the design time decisions. We tested and assessed them executing a master-worker (M/W) and Single Program Multiple Data (SPMD) applications which follow different communication patterns. Promising results of robustness in interprocess communication were obtained.

Pages: 59 to 64

Copyright: Copyright (c) IARIA, 2012

Publication date: September 23, 2012

Published in: conference

ISSN: 2308-4499

ISBN: 978-1-61208-237-0

Location: Barcelona, Spain

Dates: from September 23, 2012 to September 28, 2012