Over the course of several blogs , I will talk about getting realistic failure rate data, where this failure data comes from, and how different methods of failure data analysis compare. I think if you understand this, you will begin to get a very good feel of what it takes to generate realistic failure data. This is a subject I find very important and I hope you will find your time well spent reading this.
In Part 1, I wrote about the fundamental concepts of functional safety standard for the process industries, IEC 61511. As well as the design phase of the safety lifecycle.
In this blog, I will continue with talking about two fundamental techniques that have been developed in the field of reliability engineering: failure rate estimation techniques and failure rate prediction techniques. As well as failure rate estimation techniques.
Getting Failure Data
There are two fundamental techniques that have been developed in the field of reliability engineering: failure rate estimation techniques and failure rate prediction techniques. Both of these have been around since virtually the earliest days of the reliability engineering science. The oldest textbook that I was able to find in my search was from 1953. It has chapters on both subjects. Of course they have been refined, studied, and improved substantially, especially in the last decade.
Failure rate estimation technique is a technique based on analyzing failure data from field operation. Failure rate prediction technique is based on design strength analysis or test results. It's more like a “look forward” technique, where as estimation is a “look backward” technique.
Getting Failure Data - Estimation
Failure rate estimation techniques are the oldest. They’re based on gathering up failure data. I classify them into three categories.
- Manufacturers field return data studies
- Industry databases (where field failure data is gathered up from a group of companies)
- End-user field failure data (where a particular company or perhaps even a particular site, gathers up the failure rate information from their specific situation)
All of them have pluses and minuses, but all of them can provide tremendously useful information.
Manufacturer Field Return Studies
Manufacture field return studies, the first category of estimating, has pluses and minuses. The plus is, of course, that it is real data. There’s no denying that. The minus is the calculation methods, perhaps I should just call that “the assumptions”. They vary wildly because a manufacture can never know for sure what percentage of actual failures are returned. You won't believe this, but, manufacturers use very different definitions of failure from site to site.
I used to work for a manufacturer. I was taught that a returned item is classified only as a failure if we discovered a manufacturing defect. Many of the problems were classified as “customer of abuse -- not a failure”. What do you mean “customer of abuse -- not a failure”? Some manufactures didn’t particularly want to pay for warranty returns. Generally, I just can't tell you what percentage or how many failures were marked “no problem found,” “systematic failure,” or “customer abuse.” I see that over and over again in the databases that exida studies.
Calculations are often done where operational hours are estimated based on shipping records. It is assumed that all failures are returned. You can easily imagine what happens when a manufacturer uses that approach. The failure rates are very low. A low failure rate in safety analysis means what? Yeah… it means danger, under design. This is not good.
So in spite of all these problems, one of the real advantages of a manufacture field returned study is that the data that is designated a failure are often investigated thoroughly down to root cause. The data is tremendously valuable in helping to establish component failure rates and component failure modes. I'm very skeptical about using manufacturer field data for absolute failure rates. The information is highly valuable.
The second category is industry databases. There are many over the years, but today, the most prominent, at least for the process industry database is OREDA (the offshore reliability data association). This is a compilation of failure data from a number of different companies in the North Sea, offshore. It's operated by DNV in Norway. The data analysis is done by SINTEF in Norway. It provides very useful data on process equipment and it's kept relatively up-to-date with the latest public release that I was able to find from 2009/2010. The volume I have is the one with the purple bottom and is a quite thick. It has a lot of information in it.
End User Field Failure Data Studies
The third category is End-user field failure studies. At exida, we've gathered up a lot of field failure data studies from around the world. Some of them are high quality and some of them are not. We’ve gone through and studied a lot of them and I have a large number of stories that I could tell you. Like the one case where a failure is recorded only if they have to send it out for repair. All repairs made in-house are never counted. That’s because the machine that generates the failure report is the same one that generates the purchase order. If they repair inside there is no reason for a purchase order! Obviously, that failure rate data set had a very low failure rate because a small percentage of the total failures were ever recorded.
So you got to ask “When was the failure report written?” You have to ask questions about what's the definition of failure. It shocks me sometimes how people classify things as random versus systematic. In some areas of the world and some companies, I've even heard the argument that all failures are systematic therefore the real failure rate is zero. I don't think that's very realistic. Do you? However, if these questions are asked and we discover that a good system is in place, where all failures are recorded, the end-user field failure data studies can generate very valuable information.