Random versus Systematic Faults: What’s the difference?

I saw and responded to a LinkedIn discussion on this very issue, where someone had asked “if I have a misaligned limit switch that fails dangerously, then is it random or systematic? “. This is an intriguing question because many view human error as being systematic and, whereas, this is sometimes true, it’s not always the case. When teaching our FSE100 course we discuss the differences and why it’s important to categorise failures this way.

We tend to think of Random failures as failures that occur at random time intervals (usually hardware related), which are unpredictable. In probabilistic analysis where we try to predict the likelihood of a failure on demand, in low demand process applications, we use average failure rates in our PFDavg calculations, based upon constant failure rate during Useful Life. There are now over 200 Billion unit operating hours of failure rate data that have been collected, which give us a pretty accurate value for certain types of equipment, to use in PFDavg calculations (such as are in exSILentia).

Systematic failures, on the other hand, are insidious and can only be eliminated by a change in design, manufacturing, procedures and training. What I like to categorise as the 3 Ps:

The 3 Ps

People – are they competent and trained;
Procedures – are there well-defined and followed procedures;
Paperwork – do we have an audit trail to demonstrate that the first two are being adhered to.

This means that systematic failures are not considered in probabilistic calculations and therefore, if a site is categorizing failures as systematic they could end up with low and unrealistic failure rates, when looking at measuring the SIF performance. For this reason, it’s a good policy to categorise all field failures as random until proven otherwise. In this case, we won’t throw away any failures unnecessarily.

For example, let’s say an instrument technician who was well trained, had performed this task many times, without error, had mis-calibrated a sensor that resulted in it not being able to detect a high level (dangerous condition), although the calibration procedure and paper work was correct. Would this be categorized as a systematic error or random error?

Many would argue that, because it is human error, it would be a systematic issue.

So, let’s see how this measures up to the 3 Ps:

Personnel – the Technician is well trained and so is competent
Procedure – the procedure is correct
Paperwork – the paperwork is correct

In this case, this would be categorized as a Random error and not Systematic. Perhaps the technician was distracted, tired, having a bad day, etc. The Technician just made a mistake. It’s that simple.

However, it could be argued that for safety-related equipment the procedure should be changed to have a four-eyes policy, which would help prevent the error, so a systematic improvement.

It's easy to see how confusing it can be in determining whether a fault is random or systematic, which is why we recommend capturing the failure as random until proven otherwise.

So, coming back to the case of the misaligned limit switch, we would need to initially categorise the failure as Random so it’s captured and then to analyse whether it is actually a systematic fault or not, by looking into the 3 Ps.

Why not check out some of the webinars on this subject that are archived on the exida website.

Tagged as: Steve Gandy SIL SIF Safety Integrity Level PFDavg FSE 100 exSILentia

exida explains Blog

Random versus Systematic Faults: What’s the difference?

Steve Gandy, CFSP

The 3 Ps

So, let’s see how this measures up to the 3 Ps:

Other Blog Posts By Steve Gandy