At exida we have studied hundreds of sets of field failure data from various sources. Some of these data sets have indicated differences in failure rates by two orders of magnitude for the same product type! After tracing through the data collection process for many of these field failure data sets, it is becoming clear that one significant variable is the question "What is a random failure?".
In some data studies, very few failures are recorded. In one study it was discovered that many of the "possible failures" were classified as "systematic" and therefore not counted in the random failure rate. In a presentation from a well-known Certification Body from Germany, the presenter explained how "most mechanical failures are systematic." I heard the same comment from a valve manufacturer. The reasoning was explained in several examples:
- This valve failed because it was not designed for the application. Therefore, it is the customer's problem, a systematic failure caused by a bad design process at the customer site.
- This actuator failed because of a defect in the manufacturing process, a systematic failure of the manufacturing process design.
- This valve failed because the product designer did not use enough design strength, a systematic failure of the requirements specification.
Over and over again, a justification is made to throw out a real failure, many of which had both a random and systematic root cause. These justifications would ultimately lead to a conclusion that ALL failures are systematic!
This view is especially true in engineers working for manufacturers. I worked in new product development for a manufacturing company for many years. We did not design products to fail at random times. There was no product requirement to generate random failures in any requirements document that I have ever seen. To the best of our ability, we designed the products to work. Therefore, we often concluded that most failures are systematic in one way or another. But this view is very unrealistic.
IEC 61508 defines a random failure as “A failure occurring at a random time, which results from one or more degradation mechanisms.” This applies nicely to many of the things we really see including bad air, inadvertent poor selection of materials, material defects, random bad power events, inadvertent environmental stress, accidental calibration errors, inadvertent maintenance mistakes, random failures in a manufacturing process, etc. And all failures due to these realistic causes should be considered when estimating random failure rates.
The purpose of a random failure rate is to realistically calculate the probability of a set of safety equipment to fail to perform a safety function. Since it is very clear that all designers in the world may select equipment not designed for an application or not designed for an environment, realistic failure rate analysi must count all real failures. Until we have manufacturing processes where no defect can ever occur, all real failures must be counted. At exida we use a very realistic definition of random failure. And we recognize that failure rates do vary from site to site and can model that in our calculation tools.