Home // International Journal On Advances in Systems and Measurements, volume 13, numbers 3 and 4, 2020 // View article
Challenges in Mitigating Errors in 1oo2D Safety Architecture with COTS Micro-controllers
Authors:
Amer Kajmaković
Konrad Diwold
Nermin Kajtazović
Robert Zupanc
Keywords: soft errors; mixed-criticality; fail-safe; 1oo2D; COTS;
Abstract:
The number of Commercial-Off-The-Shelf (COTS) micro-controllers used in safety applications has increased significantly over the last decade. In contrast to safety-certified micro-controllers, they are produced without integrated protection against memory soft errors and limited in terms of available memory and computation power. However, due to constant optimizations of the memory's physical size and the voltage margins, the probability that external factors, such as magnetic fields or cosmic rays, temporally alter a memory state (and thus cause a soft error) rises. It is crucial to address such errors within safety-critical systems, and consequently, a wide range of error mitigation strategies have been proposed. In the context of established brownfield automation systems, redesign and redeployment of new hardware are usually not feasible. Therefore, other approaches can be applied to existing fail-safe architectures to further improve their performance without the need for a partial rework or conceptual changes. This article identifies challenges associated with soft error detection and correction strategies in 1-out-of-2 with diagnostic (1oo2D) safety architecture. Moreover, it investigates mitigation strategies and their deployment challenges through different production phases of the systems (i.e., greenfield) as well as requirements and limitations when working with already existing systems (i.e., brownfield). Among other parameters, the memory usage profile and its effect on the mitigation strategies is explained. A brief overview and evaluation of already available hardware-based strategies along with the evaluation of the most prominent software-based strategies are presented. In addition, a discussion about potential mitigation strategies that rely on the underlying hardware features is outlined. The article demonstrates how to identify and assess trade-offs associated with different strategies to decide on suitable methods to enhance fault tolerance in existing and future automation systems.
Pages: 250 to 263
Copyright: Copyright (c) to authors, 2020. Used with permission.
Publication date: December 30, 2020
Published in: journal
ISSN: 1942-261x