SEPTEMBER 2025 I Volume 46, Issue 3

Eucalyptus Analysis Suite interface showcasing Fault Tree Analysis with Uncertainty Quantificatio

Eucalyptus – An Analysis Suite for Fault Trees with Uncertainty Quantification

Imène Goumiri

Imène Goumiri

Lawrence Livermore National Laboratory, Livermore, CA

Luc Peterson

Luc Peterson

Lawrence Livermore National Laboratory, Livermore, CA

Adam G Taylor

Adam G Taylor

Lawrence Livermore National Laboratory, Livermore, CA

DOI: 10.61278/itea.46.3.1009

Abstract

Eucalyptus is a novel code developed at Lawrence Livermore National Laboratory to incorporate uncertainty quantification into Fault Tree Analysis (FTA). This tool addresses the challenge of imperfect knowledge in “grey-box” systems by allowing analysts to incorporate and propagate uncertainty from component-level assessments to system-level effects. Eucalyptus facilitates a consistent evaluation of the impact of subject matter expert judgment and knowledge gaps on overall system response by Monte Carlo generation of possible system fault trees, sampling probabilities of the existence of subsystems and components. The code supports the specification of fault trees through text and allows export to various formats, including auto-generated images, easing analysis and reducing errors. It has undergone extensive verification testing, demonstrating its reliability and readiness for deployment, and leverages on-node parallelism for rapid analysis. Example analyses are shown that include the identification of system failure paths and quantification of the value of further information about system components.

Keywords: Fault Tree Analysis, Uncertainty Quantification, Monte Carlo

Introduction

Fault Tree Analysis (FTA) is a well-known method of failure analysis which consists of translating the modes of failure of a physical or conceptual system into a visual tree diagram with corresponding failure logic [1]. FTA can be used to identify specific failure modes in complex systems and aids in understanding safety, survivability, and likelihood of functionality given specific events (e.g. down-stream failure of one or many subsystems). For these reasons, fault trees are popular tools for system design, risk mitigation, and robustness studies.

Figure 1. (a) Imperfect knowledge in system design leads to uncertainty in system response. (b) Eucalyptus solves this problem and allows the analyst to quantify the effect of further system knowledge.

One key limitation of FTA is that it relies on detailed knowledge of the system being analyzed. This includes perfect knowledge of the number and types of components that make up the system, as well as of the logical connections and dependencies between levels of the tree. Past work has investigated propagating uncertainty around states of knowledge about events and failure states into FTA [2-4], but no general tool or methodology exists to assess the effects of uncertainty of system architecture. A system that is being designed or studied from the outside from a position of uncertainty cannot be sufficiently analyzed with traditional FTA using a static fault tree. The purpose of the present work is to introduce a quantitative methodology and corresponding computational tool for the analysis of these “grey-box” systems in which the analyst may have imperfect knowledge of the existence and dependence of system components (Figure 1(a)).

Eucalyptus is a Python-based computational tool for performing Monte Carlo analysis on the structure of fault trees. A major feature of Eucalyptus is the systematic incorporation of component-level existential uncertainty into the process of FTA. Uncertainty in the existence of possible components in a system (including possible backup systems) leads to a family of fault trees with differing nodes and failure logic. Performing FTA on the ensemble of possible trees allows for the quantification of the effect of system uncertainty into the failure analysis, along with the effect of learning more about the system at the component level (Figure 1(b)).

Figure 2. Eucalyptus Monte Carlo samples possible system fault trees based on the analysts’ assessment of the probabilities of the existence of subcomponents. It then evaluates (either random or specified) failure states to determine probabilistic system responses.

Technical Content

Eucalyptus Code Overview

Eucalyptus uses numerical assessments of the probability of existence per component to statistically generate likely fault trees. The family of trees is evaluated for the probability of system failure given either randomly generated or user specified component failure states. The probabilities of component existence can be tuned to reflect the analyst’s knowledge of a grey-box system. For example, the analysts may be 100% certain of the existence of a propeller on an aircraft because she has seen an image of the outside of the vehicle. There may be nearly 100% certainty that the system contains an internal battery based on engineering judgement of system requirements. However, the analyst may assess only a 50% chance that the system has a second backup battery, and a 25% chance of having a third redundant battery, based on any number of geometric, weight, and cost considerations. Once each component in the model has been assigned a probability of existence, Eucalyptus samples these probabilities to generate Monte Carlo samples of possible system fault trees from a global fault tree template containing these uncertainty estimates. (Figure 2)

The legend in Figure 2 shows the types of logical and ontological information that can be input to Eucalyptus in a fault tree template. The tree consists of a network of nodes linked by “events”, which are logical conditions for which failure can flow up the tree. The base of the tree consists of childless “Base” events, which can be thought of as the core components of a system. These are linked upwards to events that serve as logic gates. Eucalyptus supports “AND” gates, for which all nodes below the tree must fail to satisfy the event, “OR” gates, which any failed child satisfies, and “>=N” gates, which require at least N failed child nodes to satisfy. “Pass-Through” events are automatically satisfied and are included mainly to aid in the definition of subsystems. Each node in the tree can be assigned one of two statuses at input, namely “OK”, or “FAILED”. As stated above, the main contribution of Eucalyptus is the inclusion of nodal uncertainty as an input in the tree template, defined as a probability of existence per node. The fault tree template thus defined serves as the main input to Eucalyptus – its core function is to sample the input probabilities to generate families of likely tress and to calculate failure odds given the input assumptions.

Two natural uses of Eucalyptus are sensitivity analysis and scenario modeling. To perform a sensitivity analysis, a given number of components may be assumed to fail at random. Eucalyptus will generate random failure scenarios given a specified number of desired failed components. The resulting system failure states are evaluated in the family of Monte Carlo fault tress resulting from the component existence probabilities as described above. This allows the analyst to quantify the sensitivity of failure assessments to component-level existence uncertainty, as well as the relative value of more information about model subsystems. The results provide a probabilistic assessment of the robustness of the system to subsystem failures given imperfect knowledge.

The second natural use of Eucalyptus is scenario modelling, where the probabilities of component failures are derived from some specific analysis of the system. This allows for the use of component failure probabilities that are correlated and associated with known damage or failure mechanisms. These can be assumed by the analyst or derived from an analysis using an external model. For each specified scenario (a vector of component failure probabilities provided by the user as an additional input), Eucalyptus performs a second Monte Carlo sampling of these failure states, ultimately delivering quantitative failure results from specified probabilities of failures for each of the sampled trees. This allows for quantitative assessments for cases where uncertainties about both component existence and failure states can be quantified.

Graphs showing failure rates and success rates based on redundancy in simulations.

Figure 3. Eucalyptus correctly solves the problem of a simple system consisting of a failed component and a possibly existing backup component via a Monte Carlo sampling of the tree template using the backup system’s assigned existence probability.

Example 1: Model Convergence for a Simple System with One Possible Redundancy

The core algorithm can be better explained with a simple example. Consider the input template tree in Figure 3(a) which consists of a simple system containing one component that has failed and a possible functioning backup system. Trivially, if there is a 50% chance that the backup system exists, there is also a 50% chance that the system has failed at the top level of the tree. In fact, for this case the probability of system function is exactly equal to the probability of the existence of the backup component.

Eucalyptus solves this problem via the Monte Carlo method by directly sampling possible trees from the specified nodal existence probabilities and then calculating the top-level system functionality for each instance of the fault tree. For this simple example there are only two possible fault trees, as shown in Figure 3(b). Figure 3(c) shows the results of sample trials of this problem, assuming the backup system exists with 50% probability. As expected, the trial runs converge to the correct probability of system failure as the size of the Monte Carlo tree family increases. Fixing the number of trials to 1000, Figure 3(d) shows that Eucalypts accurately solves the redundancy problem for each assumption of the probability of existence of the backup.

Quantification of Future Knowledge States

Within Eucalyptus, the entire analysis described above is repeated under different existence assumptions to quantify the effects of different states of knowledge about the system. Given a fault tree with specified uncertainties and failure scenarios, Eucalyptus automatically performs the abovementioned Monte Carlo analysis for the following cases:

  • Baseline Knowledge – The specified number of fault trees are sampled using the specified component existence probabilities and the failure scenarios are evaluated.
  • Component-specific Certainty – For each possibly failed component (“Base” node), the specified number of fault samples are generated assuming 100% certainty in this component’s existence. Failure scenarios are re-evaluated for each family of trees.
  • Knowledge of Everything – The failure scenarios are evaluated for a fixed fault tree in which all components are assumed to exist.


Figure 4. An example complex fault tree, where existential uncertainties are assigned to redundant or backup components in the subsystems.

In this way, the present component existence uncertainty is propagated through the FTA and compared with other states of knowledge. The outcome of future improved knowledge states is calculated directly. This allows the analyst to quantify the effect of more knowledge about specific components on the global system failure probabilities. The following example is intended to demonstrate the usefulness of this approach for increasingly complex systems in which the effects of uncertainty on the failure probabilities are not obvious.

Example 2: Sensitivity Analysis for a Generic Complex Fault Tree

This section will provide an example of a sensitivity analysis using Eucalyptus on the fault tree shown in Figure 4. The tree contains four main “failure modes”, which can be thought of as major systems, the failure of which would lead to system failure. Each mode contains different subsystems with components logically connected with “AND”, “OR”, or “>=N” gates. The tree contains 21 “Base” nodes which can be thought of as possibly failed components. Of these, 12 are assumed to certainly exist (Probability of Existence =1.0), while the other 9 are possible system redundancies and are assigned varying probabilities of existence as shown in Figure 4.
Assuming k component failures, there are

Mathematical formula showing the binomial coefficient for choosing k from 21.

possible failure combinations for a system of 21 components. Before engaging in the full-scale Monte Carlo analysis, it is useful to get a handle on the sensitivity of the tree to component failures by evaluating the possible failure cases for small k in the static fault tree; this is equivalent to an exhaustive rather than probabilistic evaluation of the “Knowledge of Everything” cases as described in the last section. The results for all possible failure states for are summarized in Table 1.

k # Possible scenarios # of failed states % of failed states
1 21 2 9.52
2 210 46 21.90
3 1330 485 36.47
4 5985 3113 52.01
5 20349 13615 66.91

Table 1. The results of all possible failure states up to 5 component failures for the static “Knowledge of Everything” case (existence certainty for all components) for the fault tree depicted in Figure 4.

A full Monte Carlo analysis was then performed on the fault tree in Figure 4, sampling 1000 trees for of k=1…5 component failures and for each assumption of component existence probability. 10000 randomly generated failure scenarios were evaluated for each tree. This yields the stem plots shown in Figure 5. The grey vertical lines in Figures 5(a)-(e) represent the “Baseline Knowledge” result, being the failure probability averaged from 10000 k-component failure scenarios for 1000 Monte Carlo trees sampled from the existence probabilities shown in Figure 4. Each point on the graph off this line represents the result of another 10000 k-component failure scenarios for another 1000 tree samples with a fixed 100 % probability of existence for the component on the y axis. The “Everything” datum corresponds to perfect certainty of the existence of all components and is therefore the result of 10000 k-component failure scenarios for one static tree containing all components.

As expected, the system failure probabilities for each k are distributed around the deterministic fault evaluations determined and listed in Table 1. Note that components whose existence decrease the probability of failure have blue stems in Figure 5, while those that increase it are red. The effect of more knowledge can affect the failure assessment either way, depending on whether a component serves as a system redundancy or an extra failure mode.

Five graphs displaying failure probabilities for various components, labeled (a) to (e).

Figure 5. Results from a full Monte Carlo evaluation for the fault tree in Figure 4. 1000 fault trees were sampled from the baseline component probabilities assumed in the tree, as well as for each component. 10000 random failure states were evaluated in each tree, assuming k component failures, where (a) k=1, (b) k=2 , (c) k=3 , (d) k=4, and (e) k=5. The stem plots provide a visualization of the effects of more knowledge of the system on the assessed probabilities of failure.

Conclusions

As design and analysis demands become more complex, the need to quantify the effects of an analyst’s uncertainty and engineering assumptions in system design becomes more crucial. Eucalyptus provides a simple computational tool for the propagation of uncertainty in subsystem/component existence and failure states through FTA. It is agnostic to the type of system or the details of the mode of failure and can therefore easily be deployed across many domains and can be coupled with other domain-specific tools and engineering models.

The cases studied in the present paper have been kept relatively simple to facilitate understanding of the code’s basic functionality. More complex fault trees which reflect real-world systems may require orders of magnitude more fault tree samples and failure cases to fully quantify the effects of uncertainties. For this reason, Eucalyptus was designed to take advantage of High-Performance Computing and on-node parallelism across members of an ensemble. Additionally, Eucalyptus supports directed acyclic graphs which best represent the complex dependencies of real-world systems.

Eucalyptus has undergone a series of verification tests for trees of increasing complexity and has been validated against analytical solutions for simple tree structures. It supports the import and export of fault trees in various text formats and allows for the visualization of complex fault trees with image export functionality. It has been designed to perform alone or to be integrated into already-existing systems analysis workflows. Probabilities of component failure from any external analysis software can be internally sampled and evaluated in the Monte Carlo fault trees to include system configuration uncertainty in domain-specific scenario analysis. It is already being deployed for real world problems.

References

1.Ericson, Clifton “Fault Tree Analysis – A History”. Proceedings of the 17th International Systems Safety Conference (1999).

2. Prabhu, Saurabh, et al. “Uncertainty quantification in fault tree analysis: Estimating business interruption due to seismic hazard.” Natural Hazards Review 21.2 (2020): 04020015.

3. Mahmood, Yasser A., et al. “Fuzzy fault tree analysis: a review of concept and application.” International Journal of System Assurance Engineering and Management 4 (2013): 19-32.

4.Yazdi, Mohammad, and Esmaeil Zarei. “Uncertainty handling in the safety risk analysis: an integrated approach based on fuzzy fault tree analysis.” Journal of failure analysis and prevention 18 (2018): 392-404.

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Author Biographies

Dr. Imène Goumiri is a staff scientist in the Computational Engineering Group at LLNL working on applications of Gaussian processes at scales ranging from atomic structures to astronomy.  Her expertise includes control theory, plasma physics, and numerical analysis.  Dr. Goumiri holds an M.A. and Ph.D. in mechanical and aerospace engineering from Princeton University, an M.Eng. in mathematical and mechanical modeling from the Polytechnic Institute of Bordeaux, and an M.S. in mathematics, statistics, and economics (with a specialty in modeling, computation, and environment) from the Université de Bordeaux.

Dr. Jayson “Luc” Peterson is the Associate Program Leader for Data Science within the Space Science and Security Program at Lawrence Livermore National Laboratory, overseeing a portfolio of several projects at the intersection of data science and space. Dr. Peterson has worked across modeling & simulation, experimental design, digital engineering, uncertainty quantification, verification & validation, data analytics, high-performance computing, and machine learning in a variety of applications, from nuclear fusion to COVID-19 response. He holds a Ph.D. and M.S. in Astrophysical Sciences (Plasma Physics) from Princeton University and a B.A. in Physics and Science, Technology, and Society from Vassar College.

Dr. Adam Taylor is a computational analyst within the Computational Engineering Division at Lawrence Livermore National Laboratory, specializing in computational mechanics and numerical analysis. He holds a Ph.D. in Civil/Geotechnical Engineering with a focus on the mechanics of granular materials, as well as a B.S. in Mathematics and B.A. in Philosophy from the University of Florida.

ITEA_Logo2021
ISSN: 1054-0229, ISSN-L: 1054-0229
Dewey Classification: L 681 12

  • Join us on LinkedIn to stay updated with the latest industry insights, valuable content, and professional networking!