The adoption of the functional safety standards continues to gain momentum in turbine applications. Both industrial and power turbine sites are now requiring compliance to IEC 61511. This blog will review both technical requirements and market trends related to functional safety system design. Market trends will cover which standards are required by region, turbine, size, and industry.
In Part 1, we discussed the application of IEC 61511 to Turbine Applications and how we demonstrate compliance. In Part 2, we took a high-level look at the safety lifecycle, take a look at the IEC 61511 lifecycle, and discuss hazard matrixes, risk graphs, and LOPAs.
In this blog, we will look at implications of IEC 61511 and effective implementation.
Implications of IEC 61511
Because of IEC 61511, you will have to use an appropriate SIL determination methodology and a high-integrity automated safety system as the means of protecting against hazard. You also have to intentionally separate both physically and electrically the safety system from basic process control. Completion of periodic proof testing should be in accordance with procedures established during the protection systems design. Lastly, there should be documented proof that regularly scheduled protection system reviews were conducted per applicable regulatory and standard's requirements.
To show compliance to the IEC standard, we need to meet three things: SIL capability, architectural constraints, and probability of failure. SIL capability is basically the strength against systematic failures. So to show SIL capability, show that you've assessed against 61508 or that you've justified proven in use. Probability of failure provides strength against random failures and you prove it by a PFD calculation. Architectural constraints are a protection against random failures, but I would really describe it as a strength against undetected failures because depending on the ratio of undetected dangerous failures to all of your failures or with the 2H path, the confidence factor of the data that goes into the analysis, you need more or less redundancy. The more uncertain the data that goes into the reliability calculation, essentially the greater the likelihood you will have to use redundancy.
We are seeing more and more of a trend of assessing turbines and even whole plants. There are a couple typical review points that we see. The first is called functional safety assessment 1 (FSA 1), which occurs after the SRS is complete. Then the second, FSA 2, is after the FAT is complete. At that point, a design basis certification can be issued for the system. The final is FSA 3, which is after the turbine is installed and integrated on site and tested. At that point, you can issue a type certificate for the turbine.
A type certificate would basically show that the process used in targeting, designing, and implementing the SIFs, the safety manual, are all in compliance with IEC 61511, the equipment was selected properly, and all the steps were done properly.
If you are looking to do a compliance system, the steps for effective implementation include a benchmark study, a gap resolution plan, developing a project functional safety management plan, a system design, implementation, and operation and maintenance.
The benchmark study focuses on looking at the procedures and processes in place by the company and comparing that to the requirements of the standard. It includes:
Safety Requirements Specification
Safety Instrumented System Design
Safety Integrity Level Verification
SIS Software Design
SIS Software Verification
SIS Factory Acceptance Test
SIS Installation and Commissioning
SIS Operation and Maintenance
SIS Modification and Decommissioning
When we do the benchmark study, we have a tool that walks you through the standard and writes out requirements in plain English with a reference to the part of the standard. If there are any gaps, those are identified, or if it is compliant, we insert the evidence as to why that is. The thought is that, with these standards, and with the safety lifecycle, we have to evaluate each part of it to assure you've got complete coverage.
One typical gaps to find is no structured process. Normally an organization is going to have some process defined, but it may not cover all of the things required by the lifecycle. The second is no agreed upon tolerable risk. If you are supplying a system and you are having multiple organizations work on it, everybody needs to be working off the same playbook and at least have the same guidance. There could also be poor communication across the organizations, missing or incomplete documentation, use of non SIL rated equipment, not including all components in an analysis, unrealistic modeling assumptions, or incorrectly modeled shared equipment. We'll take a little more detailed look at each of these.
Establish a Process
First thing is establishing a process. I recommend to create a graphic. If you have multiple pieces from multiple vendors, on the y-axis you can show the different steps of the safety lifecycle, common processes across all parts of the plant, what are unique to different vendors, and map that out and make sure everybody involved in the project is in agreement and understands their role. Anybody doing part of the project is going to have to understand the V-model as it relates to designing and implementing a safety instrumented system. You have to show safety requirement specs, move those to a conceptual design, decompose that to hardware and software requirements, doing hardware and software design configuration. And then as we come back up out of the V-model, we have integration and testing to be compliant. We've had cases where equipment was coming from multiple vendors and the duck burner (?) showed up with no documentation for the application programming or the FAT testing. That was not acceptable and had to be redone. This is a point where upfront communication and coordination not only makes the end product safer, but also makes the project flow much more smoothly.
Typical functional safety documents include a functional safety document plan which includes detailed top level requirements, addresses all phases on the safety lifecycle, has a clear description of handoffs between phases and groups, and appendixes to contain information that is needed throughout a project. The documents also include group procedures that have process descriptions for the relevant phase, inputs required, and outputs delivered. Also included is the project plan which has a tracking document for each project: who, what, when, where, how, and sign-offs. The functional safety management plan will give that top-level view, the group procedures will give the detailed necessary to execute that, and then the project plan will show how that is done throughout the project.
Another issue is non SIL rated equipment. Per IEC 61511, all equipment must be assessed per IEC 61508 or justified based on proven in use. Proven in use assessment provides measures of protection against random hardware failures and “systematic” design failures. For example, with something that has software in it, you really need to understand the development process to say it is suitable for use. This is just a cautionary note to make sure that this is addressed early on so it is not a problem later in a project.
Proven in use requirements are reasonable to achieve with lower SIL level targets and with Type A or mechanical or simple electronic equipment. It becomes much more difficult when there is software involved because people often don't have statistically meaningful data. Another big issue is to be careful of people claiming SIL 3 systems. I've seen this in particular in the turbine world where different suppliers are including different claims. Some suppliers only claim the logic solver, other suppliers include the sensors and, for example, a 2 out of 3 trip lock. I have not really seen anyone offer a standard off the shelf SIL 3 system that includes the final elements.
Another big concern is unrealistic modeling assumptions. Some people use 100% proof test coverage without really understanding what that means. They also either don't include the mission time or they don't include common cause between redundancy. So if we take a look at a proof test and we claimed 100% proof test coverage, that means that after every proof test, the device is back to like-new condition, there are no dangerous undetected failures and you are resetting the clock every time. If you do a valve or leak check every time you test your valve, you will be at 98-99%. So if you do a time trip and show that speed of response is fast enough and you do a leak test, you are going to be approaching 100% coverage. If, however, you are just doing a full valve stroke test, you are probably only getting 70% proof test coverage. The table below shows the difference. If you do a leak test once a year, you'd have a risk reduction of about 184. If you did just a full valve stroke test once a year with 70% proof test coverage, which is pretty good, you only achieve a risk reduction of 28. Thus the testing regime, testing frequency and the mission time really matter because it is going to drive dramatically different achieved reliability.
Finally, as far as things that are often missed, is not accounting for shared equipment. This is the equipment that could be part of the control loop as well as part of the safety loop. It is very typical on equipment, especially turbines, to trip the control valve and trip the stop valve with a hydraulic trip. That is a good design, but you have to realize that as you are modeling it, it can't be modeled as a completely independent design. There has to be some single point of failure in some cases with the control valve. So what you want to do is say, for the initial events, you essentially do have redundant valves and you can model that with a safety instrumented function with 1oo2 valves. Then you are going to get an output and an intermediate event frequency.
When designing a system, you are going to want to select SIL certified equipment when possible. You should also make provisions for automatic testing, diagnostics, and proof testing. You want to consider the impact of turbine refurbishment on mission time and for example, if the valves are rebuilt, you want to reset the clock. Also, you should use tools that correctly models all variables.
exida has a tool called exSILentia that allows you to address all of the variables that we have been talking about. That includes mission time, startup time, demand modes, redundancies, common cause, and is a good tool to do that modeling.
When implementing systems, ensure that all parties clearly understand their roles and responsibilities. For example, if you have the system integrator do the software, do they have the specification and validation plan? Are they configuring the safety PLC per the safety manual from the OEM? Is the delivered PLC code exactly the same as the FAT code? Then, perform a pre-startup functional safety assessment for the standard.
Key things to consider for operation and maintenance include controlling access to safety PLC configuration, perform and document all required tests, confirm rebuilds occur as planned, and identify and correct any systemic component issues.
exida recently completed a global survey to see what is going on in the turbine world trend-wise. We asked both industrial and power turbine users if their organization has a functional safety management plan. Just about 45% said that they did have a functional safety management plan, about 18% said they include functional safety as a part of their overall safety plan, just about the same (18%) said they have implemented a semi-formal functional safety plan, and about 20% responded that they have not specifically addressed functional safety. So, about 80% were addressing functional safety somehow. We also asked what standards each organization has adopted. 61511 was the most prevalent, followed by 61508, and also API 670. Just under 20% said they have not adopted any standards. The next question we asked was if they require that equipment suppliers adhere to the following standards (IEC 61508/61511/62061, ISO, API, or none). We found that more organizations required 61508 be followed than 61511, and then again a smaller percentage chose API 670. Another point we found was that almost all respondents had a program in place to address alarm management and about half for process control system cybersecurity.