White Papers

Alarm Management:


Poor alarm management is one of the leading causes of unplanned downtime, contributing to over $20B in lost production every year, and of major industrial incidents such as the one in Texas City. Developing good alarm management practices is not a discrete activity, but more of a continuous process (i.e., it is more of a journey than a destination). This paper will describe the new ISA-18.2 standard -“Management of Alarm Systems for the Process Industries”. This standard provides a framework and methodology for the successful design, implementation, operation and management of alarm systems and will allow end-users to address one of the fundamental conclusions of Bransby and Jenkinson that “Poor performance costs money in lost production and plant damage and weakens a very important line of defense against hazards to people.” Following a lifecycle model will help users systematically address all phases of the journey to good alarm management. This paper will provide an overview of the new standard and the key activities that are contained in each step of the lifecycle.

More Information   

Get a Life(cycle)! Connecting Alarm Management and Safety Instrumented Systems

Alarms and operator response are one of the first layers of defense in preventing a plant upset from escalating into an abnormal situation. The new ISA 18.2 standard on alarm management recommends following a lifecycle approach similar to the existing ISA84/IEC 61511 standard on functional safety. This paper will highlight where these lifecycles interact and overlap, as well as how to address them holistically. Specific examples within ISA 18 will illustrate where the output of one lifecycle is used as input to the other, such as when alarms identified as a safeguards during a process hazards analysis (PHA) are used as an input to alarm identification and rationalization. The paper will also provide recommendations on how to integrate the safety and alarm management lifecycles.

More Information   

Implement an Effective Alarm Management Program

Apply the ISA-18.2 Standard on Alarm Management to design, implement, and maintain an effective alarm system.

More Information   

Make some Alarming Moves

Tackle distractions that impair operator performance and process efficiency.

More Information   

Managing Alarms to Support Operational Discipline

Process alarms, coupled with operator action, are frequently cited as a safeguard in a Process Hazard Analysis (PHA) and an Independent Protection Layers (IPL) in a Layer of Protection Analysis (LOPA), but does the alarm management system really support the safeguard/IPL?

According to ISA-18.2 / IEC 62682 an alarm must indicate an equipment malfunction, process deviation, or abnormal condition that requires a timely operator action. If no action is taken, then the alarm is either invalid or the operator is not doing their job. Both scenarios represent a breakdown in operational discipline for alarm management as does the presence of nuisance alarms and alarm floods. This breakdown in operational discipline for alarms has been cited as a contributing factor in many significant safety incidents, some of which will be analyzed in this paper. If operational discipline for alarms is lacking, then it is very possible that the desired risk reduction for a process alarm used as an IPL will not be achieved and the probability of an ineffective operator response will increase.

As systems have evolved from hardwire to computer control, alarms have become easier and less expensive to implement leading to more and less purposeful alarms. Operators must contend with multiple alarms at one time with only their experience to determine priority. Alarms may be added to or removed from a control system without proper management of change. Systems may include alarms for which there is no possible action, or inadequate action time. What can an organization do to take control of their process alarms and improve operational discipline? 

More Information   

Maximizing the Reliability of Operator Response to Alarms

Layers of protection for abnormal event management can be modeled as slices of swiss cheese according to James Reason [1]. An operator’s response to an alarm is one of the first layers of protection to prevent a hazard from escalating to an incident. This paper will present best practices for maximizing the operator’s reliability for understanding and responding to abnormal situations as adapted from the alarm management standards ANSI/ISA-18.2-2016 and IEC 62682. Examples include alarm rationalization to ensure all alarms are meaningful and to capture “tribal knowledge”, prioritization to help operators determine which alarms are most critical, and creation of alarm response procedures. The treatment of safety alarms, which are those that are deemed critical to process safety or to the protection of human life or the environment, will be specifically highlighted.

The paper will also discuss key human factors considerations for maximizing operator situation awareness (SA) by preventing SA “demons”; such as developing an errant mental model of the process, attention tunneling, data overload, and misplaced salience. As such the resolution of issues which inhibit operator performance, such as nuisance alarms and alarm floods, will also be discussed. 

More Information   

Saved by the Bell: Using Alarm Management to make Your Plant Safer

Recent industrial accidents at Texas City, Buncefield (UK) and Institute, WV have highlighted the connection between poor alarm management and process safety incidents. At Texas City key level alarms failed to notify the operator of the unsafe and abnormal conditions that existed within the tower and blowdown drum. The resulting explosion and fire killed 15 people and injured 180 more.1 The tank overflow and resultant fire at the Buncefield Oil Depot resulted in a £1 billion (1.6 billion USD) loss. It could have been prevented if the tank’s high level safety switch, per design, had notified the operator of the high level condition or had automatically shut off the incoming flow.2 At the Bayer facility (Institute, WV) improper procedures, worker fatigue, and lack of operator training on a new control system caused the residue treater to be overcharged with Methomyl - leading to an explosion and chemical release.

More Information   

Tips for Starting an Alarm Management Program

Using the ISA-18.2 standard can help process engineers understand, simplify, and implement a sustainable alarm management program.

Congratulations. You’ve been assigned the task of establishing an alarm management program for your facility. So where and how do you begin? This article presents four practical tips for starting an effective and sustainable alarm management program that
conforms to the tenets of a relatively new process industry standard for alarm management published by ISA.

More Information   

Using Alarms as a Layer of Protection

Alarms and operator response to them are one of the first layers of protection in preventing a plant upset from escalating into a hazardous event. This paper discusses how to evaluate and maximize the risk reduction (or minimize the probability of failure on demand) of this layer when it is considered as part of a layer of protection analysis (LOPA).

The characteristics of a valid layer of protection (Specific, Auditable, Independent and Dependable) will be reviewed to examine how each applies to alarms and operator response. Considerations for how to assign probability of failure on demand (PFD) will be discussed, including the key factors that contribute to it (e.g., operator’s time to respond, training, human factors, and the reliability of the alarm annunciation / system response). The effect of alarm system performance issues (such as nuisance alarms and alarm floods) on operator dependability (and probability of failure on demand) will be reviewed. Key recommendations will be drawn from the ISA-18.2 standard “Management of Alarm Systems for the Process Industries”.

More Information   

When Good Alarms Go Bad: Learnings from Incidents

Some of the significant process industries incidents occurred by overflowing vessels, including BP Texas City and Buncefield.  In many overflow incidents, alarms were designed to signal the need for operator intervention. These alarms may have been identified as safeguards or layers of protection, but they did not succeed in preventing the incident.  This paper reviews several overflow incidents to consider the alarm management and human factors elements of the failures.

More Information   

You Asked: Alarm Management

Setting a new Standard for Performance, Safety, and Reliability with ISA-18.2

Alarm Management affects both the bottom line and plant safety. A well- functioning alarm system can help a process run closer to its ideal operating point – leading to higher yields, reduced production costs, increased throughput, and higher quality, all of which add up to higher profits. Poor alarm management, on the other hand, is one of the leading causes of unplanned downtime and has been a major contributor to some of the worst industrial safety accidents on record.

More Information   

Cybersecurity Lifecycle:

Integrating Cybersecurity Risk Assessments Into the Process Safety Management Work Process

Cybersecurity is rapidly becoming something the process safety can no longer ignore. It is part of the Chemical Facility Anti-Terrorism Standards (CFATS). In addition, the President’s Executive Order 13636– “Improving Critical Infrastructure Cybersecurity,” has drawn attention to the need for addressing cybersecurity in our plants as it has been demonstrated that in our new world, they are now a source of potential process safety incident.

IEC 61508[2], “Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems (E/E/PE, or E/E/PES)” now has a requirement to address cybersecurity in safety instrumented systems and ANSI/ISA 84.00.01, “Functional Safety: Safety Instrumented Systems for the Process Industry Sector” is looking to include this requirement in the next revision. Currently the industry is playing catch up as there tends to be a gap in understanding between information technologists, traditionally responsible for cybersecurity, and the process automation and process safety engineers responsible for keeping our plants safe with help from automated controls and safety instrumented systems. As a result, guidance is being developed, but much of it continues to be a work in progress.

More Information   

The 7 Steps to ICS and SCADA System Security

The past two years have been a wakeup call for the industrial automation industry. It has been the target of sophisticated cyber attacks like Stuxnet, Night Dragon and Duqu. An unprecedented number of security vulnerabilities have been exposed in industrial control products and regulatory agencies are demanding compliance to complex and confusing regulations. Cyber security has quickly become a serious issue for professionals in the process and critical infrastructure industries.

If you are a process control engineer, an IT professional in a company with an automation division, or a business manager responsible for safety or security, you may be wondering how your organization can get moving on more robust cyber security practices. This white paper will give you the information you need to get started. It won’t make you a security expert, but it will put you on the right path in far less time than it would take if you were to begin on your own.

We began by condensing the material from numerous industry standards and best practice documents. Then we combined our experience in assessing the security of dozens of industrial control systems. The result is an easy-to-follow 7-step process:

Step 1 – Assess Existing Systems
Step 2 – Document Policies & Procedures
Step 3 – Train Personnel & Contractors
Step 4 – Segment the Control System Network Step 5 – Control Access to the System
Step 6 – Harden the Components of the System Step 7 – Monitor & Maintain System Security

The remainder of this white paper will walk through each of these steps, explaining the importance of each step and best practices for implementing it. We will also provide ample references for additional information

More Information   

The ICS Cybersecurity Lifecycle

With the ever changing threats posed by cyber events of any nature, it has become critical to recognize these emerging threats, malicious or not, and identify the consequences these threats may have on the operation of an industrial control system (ICS). Cyber-attacks over time have the ability to take on many forms and threaten not only industrial but also national security.

Saudi Aramco, the world’s largest exporter of crude oil, serves as a perfect example depicting how devastating a cyber-attack can truly be on an industrial manufacturer. In August 2012, Saudi Aramco (SA) had 30,000 personal computers on its network infected by a malware attack better known as the “Shamoon” virus. According to InformationWeek Security this was roughly 75 percent of the company’s workstations and took 10 days to complete clean-up efforts.

The seriousness of cyber-attacks in regards to national security was addressed by former United States Secretary of Defense Leon W. Panetta in his speech on October 2012. Panetta issued a strong warning to business executives about cybersecurity as it relates to national security.” A cyber-attack perpetrated by nation states [and] violent extremists groups could be as destructive as the terrorist attack on 9/11. Such a destructive cyber-terrorist attack could virtually paralyze the nation,” he stated. “For example, we know that foreign cyber actors are probing America’s critical infrastructure networks. They are targeting the computer control systems that operate chemical, electricity and water plants and those that guide transportation throughout this country.”

In addition to Panetta’s address, the U.S. Department of Homeland Security has issued several alerts about coordinated attacks on gas pipeline operators, according to a May 2012 report by ABC News.

This whitepaper will focus on the significance of cyber-attacks on industrial control systems (ICS) and how these attacks can be prevented by proper practice of the ICS Cybersecurity lifecycle.

More Information   

Failure Rate Data:

Accurate Failure Metrics For Mechanical Instruments

Probabilistic calculations that are done to verify the integrity of a Safety Instrumented Function design require failure rate and failure mode data of all equipment including the mechanical devices. For many devices, such data is only available in industry databases where only failure rates are presented. The failure mode information is rare, if available at all. Many give up and just say 50% safe and 50% dangerous thinking this is conservative. In some cases this is not a conservative assumption. In other cases it can be an over-kill.

More Information   

Combining field failure data with new instrument design margins to predict failure rates

Performance based functional safety standards like IEC 61511 offer many advantages including the opportunity to optimize and upgrade Safety Instrumented System (SIS) designs. But performance calculation depends upon realistic failure data for instruments used. A predictive analysis technique called Failure Modes Effects and Diagnostic Analysis (FMEDA) has been developed along with a component failure rate database that can predict failure rates of instruments based on their design strength and the expected stress environment. This method has been calibrated with over 150 billion unit operating hours of field failure data over the last 15 years.

More Information   

Comparing FMEDA Predicted Failure Rates to OREDA

Estimated Failure Rates for Sensor and Valve Assemblies 

Failure rates predicted by Failure Modes Effects and Diagnostic Analysis (FMEDA) are compared to failure rates estimated from the Offshore Reliability Data (OREDA) project for sensor and valve assemblies. Because the two methods of data analysis are fundamentally different in nature, it may be surprising that, when appropriately compared, the results from the two methods are generally quite similar. The nature of the published data for FMEDA and OREDA is explored. The relative merits of each method are discussed. 

More Information   

Development of a Mechanical Component Failure Database

In this paper, we present a methodology to derive component failure rate and failure mode data for mechanical components used in automation systems based on warranty and field failure data as well as expert opinion. We describe a process for incorporating new component information into the database as it becomes available. The method emphasizes random mechanical component failures of importance in the world of safety analysis as opposed to the wear-out and aging mechanical failures that have dominated mechanical reliability analysis. The method provides a level of accuracy significantly better than warranty failure data analysis alone. The derived database has the same form as that for electrical/electronics databases used in FMEDA analyses used to show compliance with international performance-based safety standards. Thus, the mechanical database can be used in conjunction with existing electrical/electronics databases to perform required probabilistic safety analysis on automation systems comprised of both electrical and mechanical components.

More Information   

Explaining the Differences in Mechanical Failure Rates: exida FMEDA Predictions & OREDA Estimations

This white paper describes the distinction between failure rate prediction and estimation methods in general and then gives an overview of the procedures used to obtain dangerous failure rates for certain mechanical equipment using exida FMEDA predictions and OREDA estimations. exida frequently compares field failure rate data from various sources to FMEDA results in order to validate the FMEDA component library. However, because OREDA and FMEDA methods are quite different, it is not possible to compare their results directly. A methodology is presented which creates predictions and estimations that are more comparable. The methodology is then applied to specific equipment combinations and the results are compared. When differences in the results exist between the two methods, plausible explanations for the differences are provided.

The comparisons show that the OREDA failure rates are well within the range of the exida FMEDA results. The comparisons also show that, with two exceptions, the average FMEDA predictions for dangerous failure rates are only slightly less than those of the OREDA estimations. In those two exceptions, FMEDA predictions are higher than OREDA. Therefore, it is reasonable to conclude that, when compared in an “apples-to-apples” fashion, for the equipment analyzed in this paper, the exida FMEDA predictions and OREDA estimations are quite comparable.

More Information   

Field Failure Rates - The Good, The Bad, The Ugly

There are many benefits to a company when they have access to good field failure data. Most of the benefits are categorized as saving money. At the same time, most of the expenditure to get good failure data is already being spent. Given an incremental cost of improving data collection quality and better data analysis, the nice benefits could be achieved.

Good high quality field failure data has often been described as the ultimate source of failure data. However, not all field failure studies are high quality. Some field studies simply do not have the needed information. Some field studies make unrealistic assumptions. The results can be quite different depending on methods and assumptions. Some methods produce optimistic results that can result in bad designs and unsafe processes.

This paper presents some common field failure analysis techniques, shows some of the limitations of the methods and describes important attributes of a good field failure data collection system.

More Information   

Improving Reliability & Safety Performance of Solenoid Valves by Stroke Testing

Solenoid valves integrated into the design of emergency shutdown (ESD) valves used in industrial process systems, can tend to bind, i.e., to become stuck in one position, when not moved for long periods of time. This binding, also known as failure due to excessive stiction, has significant negative impacts on the valve’s reliability and safety performance. It is a serious and costly problem normally addressed by expensive and time-consuming manual proof tests which typically require a process shutdown to perform testing. This paper describes an effective, alternative in-service testing protocol, known as valve stroke testing, which verifies whether or not the solenoid valve is stuck in position. It recommends a best practice procedure for implementing the valve stroke test. It provides a quantitative example of how valve stroke testing significantly improves safety performance when performed frequently (at intervals of one week or less) or even infrequently (at intervals of three to six months). 

More Information   

Mechanical Failure Rate Data for Low Demand Applications

The use of IEC 61508 [1] and IEC 61511 [2] has increased rapidly in the past several years. Along with the adoption of the standards has come an increase in the need for accurate reliability data for devices used in Safety Instrumented Systems (SIS), both electronic and mechanical. While the methodology of determining failure rates for electronic equipment is fairly well accepted and applied, the same can not be said for mechanical equipment. Several methods are currently being utilized for generating failure rates for mechanical components. These methods vary in their approach and often lead to dramatically different failure rates which can lead to significant differences when calculating the reliability of a safety instrumented function (SIF). Some methods can result in dangerously optimistic failure rate numbers.

This paper reviews the methods utilized to determine mechanical reliability for components utilized in safety systems and provides a recommendation for the most appropriate methodology.

More Information   

Random versus Systematic Failures – Issues and Solutions

Functional safety standards provide definitions of two different categories of failures: random failures and systematic failures. These were created during the standards committee discussions of failure types to be modeled in the probabilistic failure analysis. It was decided that random failures are counted in the probabilistic failure rate analysis and systematic failures are not counted.

Systematic failures were considered to be a direct result of some design or procedure problem. They occur when a set of circumstances happen to reveal the fault. The committee thinking was that systematic failures could be permanently “fixed” by a change in a design or a procedure. It was assumed that the fix would always be completely effective. After the fix, the failure would not happen again and therefore any such failures should not be counted.

Many companies establish programs to record and analyze failures. A failure rate analysis is performed to determine device failure rates. One problem observed while reviewing these studies is that many people have completely different interpretations of the definitions of random versus systematic failures. In some cases most failures are classified as systematic. This creates a dangerous bias in field failure rate analysis.

At some sites, those performing the analysis have realized that failures classified as systematic do prevent safety devices from performing their safety function and are therefore dangerous. These failures occur under conditions which seem to occur randomly and can be modeled with exactly the same probabilistic analysis. These failures impact the probability of dangerous failure and they certainly should be counted in any failure rate analysis.

This thinking is realistic as systematic failures may not be effectively corrected even when changes to the design or the procedures are made. If a systematic failure is effectively corrected then, in future data collection, the quantity of failure reports will decrease and will reflect the change. If the change was not effective the data will show that as well. Any updated field failure rate analysis will then reflect the improvement or not. So most engineers now understand that to improve safety and achieve realistic measurement of safety: 

  •   All failures must be counted in failure rate analysis and
  •   All failures must be reviewed to determine if the failure can be practically prevented in the future. 

More Information   


FMEDA - Accurate Product Failure Metrics

The letters FMEDA form an acronym for “Failure Modes Effects and Diagnostic Analysis.” The name was given by one of the authors in 1994 to describe a systematic analysis technique that had been in development since 1988 to obtain subsystem / product level failure rates, failure modes and diagnostic capability (Figure 1).


Figure 1: FMEDA Inputs and Outputs.

The FMEDA technique considers:

  • All components of a design,
  • The functionality of each component,
  • The failure modes of each component,
  • The impact of each component failure mode on the product functionality,
  • The ability of any automatic diagnostics to detect the failure,
  • The design strength (de-rating, safety factors) and
  • The operational profile (environmental stress factors).


More Information   

Safety Accuracy Is Dead; Long Live Safety Deviation!

Safety deviation is a term used in functional safety. Safety deviation (formerly safety accuracy) is the change in output due to (internal) component failures not analyzed in a Failure Modes, Effects, & Diagnostic Analysis (FMEDA). Safety accuracy is an input to the FMEDA analyst to advise the level of analysis detail for critical analog componentsThe term is defined, some of its history is described, the reasoning for its existence is given, and its application is presented.

More Information   

Functional Safety Certification:

3 Fatores Importantes na Avaliação de um dispositivo Certificado SIL

Hoje existe uma crescente tendência que os usuários finais optem por fabricantes que tenham equipamentos certificados conforme a norma IEC 61508 (SIL – Safety Integrity Level). Esta é uma excelente tendência por diversos motivos. O primeiro é porque para se obter a certificação, os fabricantes necessariamente devem fornecer as taxas de falha e modos de falha do produto. Essas informações são normalmente obtidas a partir de um relatório conhecido como análise dos modos de falha, efeitos e diagnósticos (Failure Modes Effects and Diagnostic Analysis - FMEDA).

More Information   

3 Important Factors in Evaluating your SIL Certified Device

Today there is a growing trend by end-users to require equipment manufacturers to get their safety devices IEC 61508 (SIL) Certified. That is an excellent trend for a number of reasons. One reason is because in order to get a device SIL Certified, a company must first determine the device’s failure rates and failure modes. This is usually done by having a Failure Modes Effects and Diagnostic Analysis, (FMEDA) performed. Among other things, an FMEDA Report will detail the device’s Architectural Constraints and its λDU (Dangerous Undetected Failure Rate). With any given values for maintenance parameters, (Test Interval, Test Coverage, and Repair Time), you can determine the device’s PFDavg (Average Probability of Failure on Demand ). Both the Architectural Constraints and the PFDavg of a device, together with its IEC 61508 Certification, are critical in evaluating whether or not a given device may be suitable for use in a Safety Function with a given SIL requirement. And both of these characteristics, together with IEC 61508 Certification, are what concern a Safety Engineer in his evaluation.

More Information   

Comparing Certification under IEC 61508 1st Edition and 2nd Edition

An updated version of IEC 61508, Functional Safety of Electrical/Electronic and Programmable Electronic Systems was issued in September 2010.  This Second Edition is generally thought to clarify common interpretations of the first edition [N2] and add some refinements that had accumulated in the ten years since the first edition. The fundamental concepts and requirements did not change.

More Information   

Criteria for the Application of IEC 61508:2010 Route 2H

This paper explains how exida applies the requirements of IEC61508:2010 Route 2H to its process of certifying devices for use in safety applications. 

Rather than having specific designs and a long list of specific rules that become obsolete, the IEC 61508 standard allows any safety instrumented function (SIF) design to be implemented. The standard allows the design to use old products or new technology. The standard allows innovation and good engineering. However, any SIF design must be verified with documented performance metrics which must match risk reduction requirements in the form of safety integrity levels (SIL). In order to verify that a design meets the needed risk reduction, the designer must check three performance criteria.

This paper is devoted to one of those performance criteria, viz., minimal architectural constraints which, per IEC 61508, may be met in one of two ways, i.e., via Route 1H or Route 2H. Furthermore, this paper deals exclusively with Route 2H because, for practical purposes, Route 2H produces a realistic SIL level for a given design and does not impose artificial redundancy.

This paper

  • Describes the requirements of IEC 61508:2010 Route 2H,
  • Discusses how exida’s component failure rate and failure modes databases meet or exceed the data requirements of IEC 61508:2010 Route 2H,
  • Delineates the criteria exida uses in applying Route 2H to certify devices in a given environment,
  • Discusses the common situation of needing to certify a device, with a significant operational history that was previously certified in one environment, which will now be deployed in a new environment, and delineates the criteria exida uses to accomplish this certification. 

More Information   

Determining Software Safety: Understanding the Possibilities and Limitations of International Safety

This document is intended for readers who are familiar with the international safety standard IEC 61508 [Ref. 1] in general and with that document’s Part 7: Annex D [Ref. 2] in particular. As currently written, Annex D provides “initial guidelines on the use of a probabilistic approach to determining safety integrity for pre-developed software” (SW) included in safety instrumented functions. It further states that “the annex provides an indication of what is possible, but the techniques should be used only by those who are competent in statistical analysis.” If these guidelines are to be used effectively in the testing and certification of safety-related SW it is essential that individuals involved in testing and certifying such SW understand how to interpret these guidelines correctly. To this end, this document explains the possibilities and limitations inherent in the information contained in IEC61508-7 Annex D.

More Information   

Effects of Non‐Uniform Software Input Sampling on Confidence Levels Achieved per IEC 61508‐7 Annex D

International safety standard IEC 61508‐7 Annex D prescribes sampling sizes of safety critical software (SW) inputs needed to be consecutively processed correctly in order to ascertain that the SW meets a certain safety integrity level (SIL) with a certain statistical confidence level. The sample sizes in Annex D Table D.1 are derived from a Bernoulli sampling model which requires that the sampled inputs be uniformly distributed.

The simulations reported in this paper are intended to answer the question: If one uses the sample sizes as prescribed in IEC 61508‐7 Annex D but the sampled safety critical SW inputs are not uniformly distributed, do the confidence levels in Table D.1 still hold? The answer is NO. When the sampled safety critical SW inputs are not uniformly distributed, the confidence levels attained depend not only on the sample size but also on the distributions of both the sampled safety critical SW inputs and the distribution of those safety critical SW inputs that will not be correctly processed. Since it is impossible to know the distribution of safety critical SW inputs that will not be correctly processed, it is impossible to know the confidence levels attained if the sampled safety critical inputs are not uniformly distributed. Consequently, SW cannot be safety certified according to Annex D unless the SW tester can demonstrate that the safety critical SW inputs used in the tests were uniformly distributed.

More Information   

Functional Safety and EMI

Electromagnetic Interference (EMI) is just one of the environmental stresses that can stop a system from performing its safety function. It is important for a functional safety system to be immune from the EMI levels that are likely to be present. Unlike other environmental stresses, like temperature and vibration, EMI is more difficult to sense and is more likely to be transitory. Still, the effects can be catastrophic.

More Information   

Properly Assessing Diagnostic Credit in Safety Instrumented Functions Operating in High Demand Mode

According to the basic functional safety standard IEC61508:2010 Part 2 [1], when assessing the safety performance of a safety instrumented function (SIF) operating in high demand mode, full credit can be given for the positive effects of automatic self‐diagnostics (ASD) in SIF devices provided the frequency of ASD execution is 100 times (100X) or more the demand rate on the SIF and the SIF is configured to convert dangerous failures into safe failures via an automatic shutdown. However, no credit may be given for the positive safety effects of ASD if the frequency of ASD execution is less than 100X the demand rate.

This paper shows that the 100X requirement is quite excessive and that significant positive safety effects accrue even when the ASD frequency is much smaller than the 100X stipulation. The theory, which provides reasonable justification for assigning some degree of partial diagnostic credit (PDC) for ASD based on the ratio of ASD frequency to demand rate, is developed under two different assumptions: Scenario 1 which is extremely conservative and Scenario 2 which is more realistic. 

More Information   

Real Time Operating Systems for IEC 61508

In today’s world many potentially dangerous pieces of equipment are controlled by embedded software. This equipment includes cars, trains, airplanes, oil refineries, chemical processing plants, nuclear power plants and medical devices. As embedded software becomes more pervasive so too do the risks associated with it. As a result, the issue of software safety has become a very hot topic in recent years. The leading international standard in this area is IEC 61508: Functional safety of electrical/electronic/ programmable electronic safety-related systems. This standard is generic and not specific to any industry, but has already spun off a number of industry specific derived standards, and can be applied to any industry that does not have its own standard in place. Several industry specific standards such as EN50128 (Railway), DO-178B (Aerospace), IEC 60880 (Nuclear) and IEC 601-1-4 (Medical Equipment), are already in place. Debra Herrmann (Herrmann, 1999) has found a total of 19 standards related to software safety and reliability cut across industrial sectors and technologies. These standards’ popularity is on the rise, and more and more embedded products are being developed that conform to these standards. Since an increasing number of embedded products also use an embedded real time operating system (RTOS), it has become inevitable that products with an RTOS are being designed to conform to such standards. This creates an important question for designers: how is my RTOS going to effect my certification? This article will attempt to explore the challenges and advantages of using an RTOS in products that will undergo certification.

More Information   

The exida 61508 / Cybersecurity Certification Program FAQ

The exida IEC 61508 Certification Program was established in 2005 in response to demand primarily from end users in the process industries and manufacturers of instrumentation products. There was a need to provide a higher quality of technical expertise with effective and responsive service.

exida is an accredited Certification Body (CB) authorized to perform product certification by the American National Standards Institute (ANSI) in the technical fields of functional safety and cyber-security. ANSI is the Accreditation Body (AB) for IEC standards in the United States. They are a member of the International Accreditation Forum (IAF). Most countries in the world have an AB which is a member of IAF (www.iaf.nu). IAF members have agreed to the Multilateral Recognition Agreement recognizing the equivalence of other member’s accreditations. Thus IAF member accreditations are valid in most countries of the world.

The exida IEC 61508 Certification Program offers the most comprehensive product review of any Certification Body (CB) resulting in products that are safer, more secure, easier to use, and more reliable. 

More Information   

Functional Safety Lifecycle:


Em diversas edições da Revista InTech América do Sul foram publicados, por vários autores, artigos sobre Sistemas Instrumentados de Segurança e as normas internacionais que norteiam as melhores práticas aplicadas a tais projetos. Agora chegou a vez de falar sobre as normas brasileiras!

More Information   

Accurate Modeling of Shared Components in High Reliability Applications

Accurate Modeling of Shared Components in High Reliability Applications

More Information   

Assessing Safety Culture via the Site Safety Index

How can a company establish a baseline measurement of its safety culture against which to gauge improvement? The Site Safety IndexTM (SSI) quantifies in part (on a scale of 0 – 4) the degree to which a company’s end-user practices support the attainment and retention of an appropriate safety culture for operations and maintenance. Further, the SSI can be used to appropriately adjust parameters that directly impact measures of safety such as probability of failure on demand (PFDavg). Thus, the impacts of safety culture can be further quantified and the effects of changes to safety culture can be assessed. 

More Information   

Functional Safety in the Life Science Industries

There are many benefits to a company when they have access to good field failure data. Most of the benefits are categorized as saving money. At the same time, most of the expenditure to get good failure data is already being spent. Given an incremental cost of improving data collection quality and better data analysis, the nice benefits could be achieved.

Good high quality field failure data has often been described as the ultimate source of failure data. However, not all field failure studies are high quality. Some field studies simply do not have the needed information. Some field studies make unrealistic assumptions. The results can be quite different depending on methods and assumptions. Some methods produce optimistic results that can result in bad designs and unsafe processes.

This paper presents some common field failure analysis techniques, shows some of the limitations of the methods and describes important attributes of a good field failure data collection system.

More Information   

Integrated Safety for a Single BMS

Evaluation Based on Siemens Simatic PCS7 System 

Any industry that has a requirement for a heated medium, whether it is used for process, utilities or emissions, utilizes equipment that has combustion controls and combustion safequards.

There has been an evolution in these controls from a traditional control that separates the DCS from either a relay-based system, PLC or Safety PLC for combustion safeguarding, to combined control for systems with less complexity as in a single burner BMS.

Functional safety standards like IEC 61511 do permit combined control and combustion safeguarding in one system. Other standards like the 2015 edition of NFPA 85 now explicitly allow combining combustion control and combustion safety in the same logic solver for certain applications. However several design issues must be considered and properly addressed in order to maintain or improve safety performance.

A properly designed combination combustion control and combustion safeguarding system can enhance the Safety Lifecycle by reducing engineering, operations and maintenance errors and improve combustion safety. 

More Information   

Position Paper on IEC 61508 2010 Definitions Regarding Minimum Hardware Fault Tolerance

The release of IEC 61508 2010 has led to several discussions on how certain new, updated, and unmodified definitions need to be interpreted. The controversy relates to the determination of the required minimum hardware fault tolerance / architectural constraints interpretation.

This position paper explains the position that exida has taken with regard to this issue. The position paper is structured in two parts; the position and the Rationale for the position including counter arguments received over the last couple of months. The exida position is also implemented in the exida exSILentia safety lifecycle tool.

More Information   

Reducing Project Lifecycle Cost with exSILentia®

The international functional safety standard IEC 61511 provides the safety lifecycle as a steadfast guideline to assess and mitigate risk for manufacturing processes including refineries, chemical, petrochemical, pulp and paper, and power plants. To achieve a functionally safe system, it is essential to follow each requirement in the standard. However, consistent execution is difficult to achieve and often depends on the tools used to perform analysis and specification of the safety instrumented system. For the functional safety consultants at exida, the need for a consistent work process was fulfilled with the creation of the exSILentia software suite. exSILentia includes a module for each stage of the safety lifecycle. Use of the tool ensures quality assessment and execution of a safety instrumented system, as well as compliance to the safety standard. exSILentia also streamlines these tasks, easily transferring data from one module to another to save the user time and money.

In this paper, the benefit of using exSILentia versus use of excel spreadsheets or other in-house tools is quantified. The intent is to show how users of the software reduce the number of engineering hours, and therefore dollars spent, for each safety lifecycle task. It is assumed that all required information is available when needed. Through conservative estimates, this paper proves that it pays to use exSILentia to support your safety lifecycle tasks and to make safety a priority. 

More Information   

Roadblocks to Approving SIS Equipment by Prior Use

Prior Use (Proven In Use in 61508) equipment is when a documented assessment has shown that there is appropriate evidence, based on the previous use of the component, that the component is suitable for use in a particular application of a safety instrumented system at a given integrity level. There are two dimensions to this issue, suitability of a component to meet application requirements and sufficient integrity for safety critical functions.

Both international functional safety standards, IEC 61508 and ISA84.01-2004 (IEC 61511 MOD.) have clauses describing Prior Use requirements. One should also refer to TR84.00.04 for additional information on Prior Use.

This paper will discuss some of the particular roadblocks found at the plant site in trying to meet Prior Use guidelines in the ISA84.01-2004 (IEC 61511 MOD.) standard. This paper will not discuss the process for assessing the equipment.

More Information   

Setting the Standard

Dr Peter Clarke explains how process plants can benefit through proper and careful adoption of the IEC 61511 safety standard.

More Information   

The Key Variables Needed for PFDavg Calculation

In performance based functional safety standards, safety function designs are verified using specified metrics. A key metric for process industry designs is called average Probability of Failure on Demand (PFDavg). After several studies of many field failure and proof test reports, several variables have been identified as key to a realistic PFDavg calculation. Most simplified equations including the informative section in IEC 61508, Part 6 do not include several key variables. It is shown that exclusion of these parameters may result in an optimistic metric calculation which may result in an unsafe design.

This paper identifies the key variables that need to be included in a PFDavg calculation and provides some simplified equations showing the impact of most variables. An example showing two sets of variables reveals an entire SIL level difference in PFDavg calculation results.

More Information   

Using Simulation to Characterize Common Cause

Fault tolerant systems have been designed for safety critical applications including the protection of potentially dangerous industrial processes. These systems are typically evaluated and certified to functional safety standards with IEC 61508 [1] by agencies like exida Certification or one of the TUV companies. Many factors are taken into account during the certification process including hardware diagnostic capability, level of hardware redundancy, design processes used, software diagnostics and general equipment strength. It has clearly become recognized that common cause failures can have a major negative impact on the safety and availability of a fault tolerant system [2]. The whole value of redundancy may be ruined. Common cause is recognized as an important factor but there is disagreement regarding how to account for common cause in the quantitativemodeling. Part 7 of IEC 61508 at least provides a list of questions with a point scoring system [3].

More Information   

What Does Proven-In-Use Imply?

The functional safety standards, IEC 61508, IEC 61511, and ANSI/ISA 84.01 each specify the Safety Integrity Level performance parameter for Safety Instrumented Functions. For a Safety Instrumented Function to meet a specific Safety Integrity Level the sum of the average Probability of Failure on Demand (PFDavg) of all components, part of that Safety Instrumented Function, needs to fall in the PFDavg bandwidth related to that Safety Integrity Level.

More Information   

Operations & Maintenance:

Quantifying the Impacts of Human Factors on Functional Safety

It is not difficult to think of numerous ways in which human factors might impact functional safety. For example, imperfect repair, improper calibration, faulty installation, etc. can all contribute to decreases in functional safety performance. However, if actions are to be taken to mitigate these impacts then it is essential to have some quantitative measure of the impacts so that the actions’ effectiveness can be assessed. This paper describes how the Site Safety IndexTM (SSI) is used to adjust safety metrics, computed under the assumptions that human factors play no part in safety system performance, to reflect the effects of human factors on safety system performance on a site by site basis. Examples of the application of the method showing the safety performance changes attributable to human factors are provided. 

More Information   

Results of Statistical Analysis of Pressure Relief Valve Proof Test Data

The purpose of this document is to report on our successful efforts to validate
statistically certain random equipment failure rate data used in a mechanical
parts failure rate and failure mode database and, by extension, to validate the
techniques used to derive the data. To accomplish this, a Failure Modes,
Effects, and Diagnostic Analysis (FMEDA) is initially used to predict the usefullife
failure rate for the fail-to-open condition of a particular pressure relief valve
(PRV) using the failure rates from the mechanical parts database. Next, this
prediction is statistically tested against three independent data sets consisting of
proof test data for PRV provided by Fortune 500 operating companies. The data
sets all meet the intent of the quality assurance of proof test data as documented
by the Center for Chemical Process Safety (CCPS) Process Equipment
Reliability Database (PERD) initiative. By applying the quantal response method
to the results of these PRV proof tests, it is demonstrated that the proof test data
are consistent with the predictions of the FMEDA. Specifically, all of the data
sets support the FMEDA result at a 95% confidence level. All analyses lead to a
useful-life PRV failure rate between 10-8 and 10-7 failures/hour.

It is very important to note that the results of this study cannot be used to
justify extension of proof test intervals beyond the useful life of the PRV. The
small value of the failure rate derived from the FMEDA applies only to the useful
life of the PRV which depends not only on the equipment’s specifications but also
on other factors, such as the ambient and process environment in which the PRV
is used and the levels and frequency of any on-line maintenance performed.
Data analyses place useful life in the range of 4 to 5 years.

Finally, we note that the results of the statistical analyses of the three
independent data sets predict an initial failure probability of approximately 1% –
1.6%. This initial failure probability is extremely significant as it accounts for the
vast majority of failures observed in proof test. This emphasizes the value of
careful installation and thorough commissioning procedures. When
commissioning testing cannot be done after installation, as is the case with a
PRV, both the initial probability of failing to open, as well as the PFD based upon
the random failure rate must be taken into account in the risk analysis.

More Information   

Statistical Signature Analysis

In order to assign a SIL to equipment in low demand applications, we must be able to compute PFDavg. To compute PFDavg, we must first have a model for λD(t), the failure rate of the equipment in the dangerous failure mode. A dangerous failure occurs when equipment designed for prevention or mitigation of an unsafe condition cannot properly respond to the unsafe condition, i.e., the equipment fails on demand. For example, consider a PRV, which, in normal operation, is closed. Should it fail in the “stuck-shut” mode, it would be in a state of dangerous failure as it would be unable to respond to an overpressure event if one occurred.

More Information   

Using Predictive Analytic failure rate models to validate field failure data collection processes

This paper introduces a benchmarking technique we call Predictive Analytics (PA). Using a large set of data from FMEDA results in combination with validated field failure data, upper and lower reasonability bounds for instrument device failure rates can be established. These bounds can be used as a benchmark to help validate any data collection process. This benchmark represents the constant total failure rate (λ) inherent in the device during its useful life. For any given set of field failure data (FFD) for a device, a λ of the device is estimated and compared to the benchmark. It is not uncommon for the benchmark λ and estimated λ to differ considerably. PA provides a procedure for exploring explanations of these differences and assessing the accuracy of the estimated device λ with respect to the benchmark λ of the device. PA can often determine the source of that portion of the estimated λ value not inherent to the device but likely due to random failures of infant mortality, wear out, or initial failures, to systematic failures, or to application or site specific issues. This site specific device λ is the portion of the estimated λ the end user needs to address to improve operational reliability and safety. PA can also assess the quality of FFD and can facilitate the discovery of previously unknown device failure modes.

More Information