Home // ADVCOMP 2012, The Sixth International Conference on Advanced Engineering Computing and Applications in Sciences // View article
RADIC-based Message Passing Fault Tolerance System
Authors:
Marcela Castro
Dolores Rexachs
Emilio Luque
Keywords: Fault-tolerance; High-Availability; RADIC; message passing; socket.
Abstract:
We present an analysis design of how to incorporate a transparent fault tolerance system at socket level for message passing applications. The novel design changes the default socket model avoiding being unexpectedly closed due to a remote node failure. Moreover, a pessimistic log-based rollback recovery protocol added to this level makes possible restarting and re-executing a failed parallel process until the point of failure independently of the rest of the processes. This paper explains and analyzes the design time decisions. We tested and assessed them executing a master-worker (M/W) and Single Program Multiple Data (SPMD) applications which follow different communication patterns. Promising results of robustness in interprocess communication were obtained.
Pages: 59 to 64
Copyright: Copyright (c) IARIA, 2012
Publication date: September 23, 2012
Published in: conference
ISSN: 2308-4499
ISBN: 978-1-61208-237-0
Location: Barcelona, Spain
Dates: from September 23, 2012 to September 28, 2012