What Does Root Cause Analysis Really Mean?

This is an intriguing question. One that I often ask my classes when I’m teaching our FSE100 Functional Safety course. Very often, they do not know or do not fully understand what this means and why it is important.

The IEC61511 standard requires that any failures that occur within the Safety Instrumented System (SIS) are investigated. The reason for this is that we need to understand whether the failure was dangerous or safe, and whether the failure was random or systematic. This is important for several reasons, not least of which is to be able to properly count random failures that would be used in future PFDavg or PFH calculations, as opposed to viewing them as systematic and not including them in the random failure count. This would then lead to dangerously low and optimistic random failure numbers. The 2nd edition of IEC61511 now includes Clause 11.9.3 that states that failure rate data used in the reliability calculations (PFDavg/PFH) must be: Credible, Traceable, Documented and Justified, for the purposes of preventing unrealistically and dangerously low random failures being used in SIL verification calculations. The outcome of which would be to have lower PFDavg or PFH results, that would result in a higher risk reduction than the SIF is actually providing, which could lead to a false sense of safety.

As mentioned, we need to know whether the failure resulted in a safe or dangerous situation. Safe failures contribute to the Mean Time To Fail Safe (MTTFS), which is a target that is defined in the SRS and which needs to be met via the SIL verification of the SIFs. This is why we need to know the safe failures, and not just the dangerous failures of the equipment. Dangerous failures on the other hand, contribute to the failure on demand of the SIFs, the Mean Time To Fail Dangerous (MTTFD). If the number of dangerous failures encountered is higher than originally used during design then this will negatively impact the PFDavg or PFH for the SIF(s), which would result in a degraded performance.

For this reason, the IEC61511 standard requires a Functional Safety Assessment (FSA) 4 to be undertaken on a “periodic” basis, after some time in operation, to assess SIS performance. The purpose of this performance assessment is to ensure that the original design targets are still being met. Unfortunately, here in the US, not too many companies are conducting FSA 4. Moreover, many do not either understand the requirement to investigate SIS failures to root cause or what root cause analysis actually is. In many instances, production pressures, lack of resources or time constraints prevent failures from being properly investigated to determine the root cause. In many ways it comes down to the site safety culture and the commitment to functional safety.

Unless root cause is established, we may end up with misleading metrics when it comes to understanding how well the SIS and its SIFs are performing.

If you would like to know more, then look out for the upcoming webinar on this topic.

FSE 100 - IEC 61511: Functional Safety Analysis, Design, and Operation

Tagged as: SIS SIF mttf IEC 61511 FSE 100 FSA

exida explains Blog

What Does Root Cause Analysis Really Mean?

Dr. Steve Gandy, CFSP

Related Items

Other Blog Posts By Dr. Steve Gandy, CFSP