White Papers

Random versus Systematic Failures – Issues and Solutions

Functional safety standards provide definitions of two different categories of failures: random failures and systematic failures. These were created during the standards committee discussions of failure types to be modeled in the probabilistic failure analysis. It was decided that random failures are counted in the probabilistic failure rate analysis and systematic failures are not counted.

Systematic failures were considered to be a direct result of some design or procedure problem. They occur when a set of circumstances happen to reveal the fault. The committee thinking was that systematic failures could be permanently “fixed” by a change in a design or a procedure. It was assumed that the fix would always be completely effective. After the fix, the failure would not happen again and therefore any such failures should not be counted.

Many companies establish programs to record and analyze failures. A failure rate analysis is performed to determine device failure rates. One problem observed while reviewing these studies is that many people have completely different interpretations of the definitions of random versus systematic failures. In some cases most failures are classified as systematic. This creates a dangerous bias in field failure rate analysis.

At some sites, those performing the analysis have realized that failures classified as systematic do prevent safety devices from performing their safety function and are therefore dangerous. These failures occur under conditions which seem to occur randomly and can be modeled with exactly the same probabilistic analysis. These failures impact the probability of dangerous failure and they certainly should be counted in any failure rate analysis.

This thinking is realistic as systematic failures may not be effectively corrected even when changes to the design or the procedures are made. If a systematic failure is effectively corrected then, in future data collection, the quantity of failure reports will decrease and will reflect the change. If the change was not effective the data will show that as well. Any updated field failure rate analysis will then reflect the improvement or not. So most engineers now understand that to improve safety and achieve realistic measurement of safety: 

  •   All failures must be counted in failure rate analysis and
  •   All failures must be reviewed to determine if the failure can be practically prevented in the future. 

Download PDF