Home // SIMUL 2012, The Fourth International Conference on Advances in System Simulation // View article


A Markov Random Field Approach for Modeling Correlated Failures in Distributed Systems

Authors:
Jorge E. Pezoa

Keywords: Distributed computing; Reliability; Markov Random Fields

Abstract:
In this paper, logically and spatially correlated failures affecting a distributed-computing system (DCS) have been modeled in a stochastic manner by means of a Markov random field (MRF) approach. The MRF is induced by the topology of the communication network, and is specified locally by the reliability of each node and the degree of interaction between a node and its nearest neighbors. Thus, the MRF introduces a global probability distribution function for the failure patterns of nodes in the DCS, which is parameterized using n values per node, where n is the number of nodes in the DCS. The statistical analysis conducted on test networks has shown that, compared to independent failures, correlated failures increase: (i) the average number of failed nodes due to failures propagate among the nodes; and (ii) the probability of observing a large fraction of failed computing nodes.

Pages: 131 to 137

Copyright: Copyright (c) IARIA, 2012

Publication date: November 18, 2012

Published in: conference

ISSN: 2308-4537

ISBN: 978-1-61208-234-9

Location: Lisbon, Portugal

Dates: from November 18, 2012 to November 23, 2012