Random vs. Systematic?

Most of you know that exida gathers field failure data from many sources including manufacturers’ warranty return data and end user maintenance/failure records. At this point we have nearly 100 billion unit operating hours of data. This is probably the largest process industry data set in the world. And we use this data to calibrate the exida Failure Modes Effects and Diagnostic Analysis (FMEDA) component database which predicts future failure rates of new instruments. We also use the data in combination with a collection of FMEDA data sets to establish exida’s Predictive Analytic Benchmarks which we use to establish generic instrument failure rates for our exSILentia toolset. The objective in all this research is to generate predictive failure data that is the most realistic in the world.

But every year some engineers say “your failure rates are too high.” Others say “your failure rates are too low,” or “exida failure rates are half as much as OREDA therefore must be wrong.” Why do different technically competent groups provide failure rates that are different? I have a theory that the main reason is the definition of random versus systematic. This is important because “systematic” failures are excluded from the random failure rate analysis.

IEC 61508:2010 defines a systematic failure as “failure, related in a deterministic way to a certain cause, which can only be eliminated by a modification of the design or of the manufacturing process, operational procedures, documentation or other relevant factors. (IEC 61508-4:2010, 3.6.6)” Compare this to the definition of random failures from IEC 61508-4:2010, 3.6.5 which states “a failure occurring at a random time, which results from one or more degradation mechanisms.” I do not believe I could write a better definition but I had another theory that many people categorize examples quite differently after reading those definitions.

So exida did a survey using 16 failure examples. For each example, the person answering the survey would enter a failure classification from three choices: random failure, systematic failure, or end of life failure. The results confirm the theory. Fifty four (54) people entered their opinions. Disagreement occurred in all sixteen examples. Not one had complete agreement as to the category! So it is no wonder that failure rate studies give different failure rates. How many failures are thrown away by the analyst who classifies them as “systematic?”

When I first attended an ISA SP84 safety system standards committee meeting in the 1989, an experienced committee member explained the purpose of this new standard being written. He explained that a set of procedures would be written to significantly reduce systematic failures if these procedures were followed. He also explained that all understood that even rigorous procedures could not eliminate all failures. These were called random failures. A probabilistic method would be used to create designs with sufficient redundancy and design strength to reduce random failures to a tolerable level. Different “safety integrity levels” would be defined with different probability limits. The intent was to reduce both kinds of failures to tolerable levels.

I feel it is implied that real failures not related to faulty procedures are classified as “random.” The objective of this thinking is to provide a realistic appraisal of the ability of a set of equipment to provide automatic protection. But not everyone agrees, so we need to work on that. Before going too much further we need to answer some questions. Who wants this data? What is the purpose of this data? I submit that the data is for the owner-operators of an industrial process so they can use the methods of IEC 61511 and IEC 61508 to reduce their risk to tolerable levels. If you agree it is clear that the data should be as realistic as possible. The purpose of the data is not to blame someone for each failure. The purpose of the data is not make one manufacturer’s product look better than another. Think about these questions…

Tagged as: OREDA ISA SP84 IEC 61508:2010 IEC 61508 FMEDA Failure Modes Effects and Diagnostic Analysis Dr. William Goble

exida explains Blog

Random vs. Systematic?

Dr. William Goble, CFSE

Other Blog Posts By Dr. William Goble