Over the course of several blogs , I will talk about getting realistic failure rate data, where this failure data comes from, and how different methods of failure data analysis compare. I think if you understand this, you will begin to get a very good feel of what it takes to generate realistic failure data. This is a subject I find very important and I hope you will find your time well spent reading this.
In Part 1, I wrote about the fundamental concepts of functional safety standard for the process industries, IEC 61511. As well as the design phase of the safety lifecycle. In this blog, I will continue with talking about two fundamental techniques that have been developed in the field of reliability engineering: failure rate estimation techniques and failure rate prediction techniques. As well as failure rate estimation techniques.
In Part 2, I explained two fundamental techniques that have been developed in the field of reliability engineering: failure rate estimation techniques and failure rate prediction techniques. As well as failure rate estimation techniques.
In this blog, I will continue with talking about field data collection standards and tools as well as prevalent prediction techniques like B10 and FMEDA approaches.
Field Data Collection Standards
There are standards on field data collection. 61508 references the ISO 14224 standard for petroleum industries. IEC 60300–3 -2:2004 is good and well-written, but they're not actually that detailed, but it’s a very good start. There’s NAMUR NE 93, although not a standard. AICHE CCPS has formed with PERD (process equipment reliability database) committee where failure taxonomies, in other words failure modes, have been developed along with software.
Field Data Collection Tools
So, there's a lot help out there and then of course there are tools specifically designed to help people collect data cost-effectively. The example I use is the exida SILStat software. So it’s getting easier.
More and more people are collecting data. More and more data is coming in. That's good, but there are some problems. Fundamentally, we have to wait quite some time to get data. In some cases, the product is obsolete before enough data is ever gathered. Obviously, this is great. It's important, but something better is needed.
That's why failure rate prediction methods were created. Failure Rate prediction methods were created in the 1960s. Databases were established for components way back decades ago.
Today, the most prevalent prediction techniques used are the B10 data approach which is based on cycle testing and the FMEDA approach of predictive technique developed by engineers exida. I'll talk about both of those.
B10 (Cycle Test) Failure Data
Cycle testing… you think cycle testing? How can that be valid? I thought that when I first read about it, but it is valid under some conditions. Here’s how it works. A cycle test is done on a set of products. In other words – open / close, open / close. Open and close an actuator or solenoid valve as fast as you can. You need probably, according to our statisticians, a minimum of 20 products. You run this until 10%. Or if you have a set of 20, two of them fail. The number of total cycles until failure is called the B10 point. The number of cycles is converted into a time period by knowing the cycles per hour in any particular application. A failure rate is calculated then by dividing the 10% failure count. in other words 2 ,in our example, by the time period and you get failures per hour. Then a statement is made. I took this from a iso-13849 from machine industries. The dangerous failure rate is half and the safe failure rate is the other half. Actually, that's not a very good assumption as detailed analysis has discovered, but I guess it's better than nothing.
Now, what are the assumptions? When is this method valid? The B10, assumes that the constant failure rate during useful life is due entirely 100% to premature wear out. Remember, the purpose of this is to show when more than 10% have failed because they've worn out. It's primarily applicable to mechanical and electromechanical components. The assumption is that wear out is the only significant failure mode and that all other failure modes are totally insignificant and irrelevant. Uh oh! What happens when that assumption is not valid?
The research that we gathered up in a number of technical papers and reports has clearly shown that other failure modes become significant when these products do not move frequently. Some failure modes become significant if a product is motionless for 100 hours. That's less than one week. Does your safety valve ever sit still for a week or more? Every time I asked that question, the answers are “ours hasn't moved in three years.” Okay, then…. Failure modes in your product are not due entirely to premature wear out and other failure modes are very significant.
B10 Failure Data – Solenoids, Actuators
When O-rings or other seals are part of a product, many failure modes are significant when a product remains static for a week or more. Stiction, cold welding, corrosion binding, all create what is generally a dangerous failure in a safety application. Therefore this B10 data is absolutely not applicable to applications in the process industry. The failure data that we have been able to study supports this statement entirely. I do think B10 data can have some real relevance in machine safety and robotics. I can imagine a production machine which is moving back and forth loading parts, stamping parts, and removing parts every few seconds constantly trying to speed up the rate of the machine. There we might find premature wear out as the primary failure mode during useful life, but not in the world of process industry.
Failure Modes, Effects, & Diagnostic Analysis (FMEDA)
We kind of need a predictive method if cycle testing is no good. What do we use?
This was recognized many many years ago and of course the Mil handbook 217 F was an early version of a predictive failure rate method based on design strength analysis. Failure modes effects and diagnostic method is a predictive method that expands upon classic failure mode effect analysis and generates a lot of very useful information. It's done by performing a study of every single component in a device and how the component failure will affect the device. That sounds really tedious and boring doesn't it? It's a lot of work. It's very systematic. However and although it could be tedious and boring and takes a lot of man-hours, it can be terribly effective. In addition, the deep part is where we add an estimate or test how well the automatic diagnostics will detect each component, each failure mode of each component, or how a proof test will detect the failure of each component in any particular failure mode. Not only can we generate failure rates as a function of failure mode, but we can generate diagnostic coverage factors, we can generate proof test effectiveness, and it turns out we can generate useful life based on the useful life of the components.
Anyone that’s studied reliability engineering for some period of time understands that strength of the design is only half the story. Failures occur when stress is greater than the associated strength. Therefore failure rates are clearly a function of the environmental stress factors. Therefore FMEDAs must be done for a defined environment. In order to make things practical, exida has defined six different environmental profiles very characteristic of process industry applications. FMEDAs can are done per one of these profiles. It could be done for a specific profile however that’s even more work. In this way, it makes the whole concept of FMEDA practical.
The FMEDA is a study of design strength for a given environmental stress. Using a component database, the failure rates and failure modes for a product, any transmitter, an I/O module, a solenoid, an actuator, a valve, can be determined . Useful life can be predicted, and perhaps really important, once the world realized, many proof tests are not at all effective. What is the proof test coverage?
In the next blog, I will cover FMEDA results and accuracy.