Alarm Management:

Poor alarm management is one of the leading causes of unplanned downtime, contributing to over $20B in lost production every year, and of major industrial incidents such as the one in Texas City. Developing good alarm management practices is not a discrete activity, but more of a continuous process (i.e., it is more of a journey than a destination). This paper will describe the new ISA-18.2 standard -“Management of Alarm Systems for the Process Industries”. This standard provides a framework and methodology for the successful design, implementation, operation and management of alarm systems and will allow end-users to address one of the fundamental conclusions of Bransby and Jenkinson that “Poor performance costs money in lost production and plant damage and weakens a very important line of defense against hazards to people.” Following a lifecycle model will help users systematically address all phases of the journey to good alarm management. This paper will provide an overview of the new standard and the key activities that are contained in each step of the lifecycle.

Download PDF   

Alarms and operator response are one of the first layers of defense in preventing a plant upset from escalating into an abnormal situation. The new ISA 18.2 standard on alarm management recommends following a lifecycle approach similar to the existing ISA84/IEC 61511 standard on functional safety. This paper will highlight where these lifecycles interact and overlap, as well as how to address them holistically. Specific examples within ISA 18 will illustrate where the output of one lifecycle is used as input to the other, such as when alarms identified as a safeguards during a process hazards analysis (PHA) are used as an input to alarm identification and rationalization. The paper will also provide recommendations on how to integrate the safety and alarm management lifecycles.

Download PDF   

Apply the ISA-18.2 Standard on Alarm Management to design, implement, and maintain an effective alarm system.

Download PDF   

Tackle distractions that impair operator performance and process efficiency.

Download PDF   

Recent industrial accidents at Texas City, Buncefield (UK) and Institute, WV have highlighted the connection between poor alarm management and process safety incidents. At Texas City key level alarms failed to notify the operator of the unsafe and abnormal conditions that existed within the tower and blowdown drum. The resulting explosion and fire killed 15 people and injured 180 more.1 The tank overflow and resultant fire at the Buncefield Oil Depot resulted in a £1 billion (1.6 billion USD) loss. It could have been prevented if the tank’s high level safety switch, per design, had notified the operator of the high level condition or had automatically shut off the incoming flow.2 At the Bayer facility (Institute, WV) improper procedures, worker fatigue, and lack of operator training on a new control system caused the residue treater to be overcharged with Methomyl - leading to an explosion and chemical release.

Download PDF   

Using the ISA-18.2 standard can help process engineers understand, simplify, and implement a sustainable alarm management program.

Congratulations. You’ve been assigned the task of establishing an alarm management program for your facility. So where and how do you begin? This article presents four practical tips for starting an effective and sustainable alarm management program that
conforms to the tenets of a relatively new process industry standard for alarm management published by ISA.

Download PDF   

Alarms and operator response to them are one of the first layers of protection in preventing a plant upset from escalating into a hazardous event. This paper discusses how to evaluate and maximize the risk reduction (or minimize the probability of failure on demand) of this layer when it is considered as part of a layer of protection analysis (LOPA).

The characteristics of a valid layer of protection (Specific, Auditable, Independent and Dependable) will be reviewed to examine how each applies to alarms and operator response. Considerations for how to assign probability of failure on demand (PFD) will be discussed, including the key factors that contribute to it (e.g., operator’s time to respond, training, human factors, and the reliability of the alarm annunciation / system response). The effect of alarm system performance issues (such as nuisance alarms and alarm floods) on operator dependability (and probability of failure on demand) will be reviewed. Key recommendations will be drawn from the ISA-18.2 standard “Management of Alarm Systems for the Process Industries”.

Download PDF   

Some of the significant process industries incidents occurred by overflowing vessels, including BP Texas City and Buncefield.  In many overflow incidents, alarms were designed to signal the need for operator intervention. These alarms may have been identified as safeguards or layers of protection, but they did not succeed in preventing the incident.  This paper reviews several overflow incidents to consider the alarm management and human factors elements of the failures.

Download PDF   

Setting a new Standard for Performance, Safety, and Reliability with ISA-18.2

Alarm Management affects both the bottom line and plant safety. A well- functioning alarm system can help a process run closer to its ideal operating point – leading to higher yields, reduced production costs, increased throughput, and higher quality, all of which add up to higher profits. Poor alarm management, on the other hand, is one of the leading causes of unplanned downtime and has been a major contributor to some of the worst industrial safety accidents on record.

Download PDF   

Cybersecurity Lifecycle:

Cybersecurity is rapidly becoming something the process safety can no longer ignore. It is part of the Chemical Facility Anti-Terrorism Standards (CFATS). In addition, the President’s Executive Order 13636– “Improving Critical Infrastructure Cybersecurity,” has drawn attention to the need for addressing cybersecurity in our plants as it has been demonstrated that in our new world, they are now a source of potential process safety incident.

IEC 61508[2], “Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems (E/E/PE, or E/E/PES)” now has a requirement to address cybersecurity in safety instrumented systems and ANSI/ISA 84.00.01, “Functional Safety: Safety Instrumented Systems for the Process Industry Sector” is looking to include this requirement in the next revision. Currently the industry is playing catch up as there tends to be a gap in understanding between information technologists, traditionally responsible for cybersecurity, and the process automation and process safety engineers responsible for keeping our plants safe with help from automated controls and safety instrumented systems. As a result, guidance is being developed, but much of it continues to be a work in progress.

Download PDF   

The past two years have been a wakeup call for the industrial automation industry. It has been the target of sophisticated cyber attacks like Stuxnet, Night Dragon and Duqu. An unprecedented number of security vulnerabilities have been exposed in industrial control products and regulatory agencies are demanding compliance to complex and confusing regulations. Cyber security has quickly become a serious issue for professionals in the process and critical infrastructure industries.

If you are a process control engineer, an IT professional in a company with an automation division, or a business manager responsible for safety or security, you may be wondering how your organization can get moving on more robust cyber security practices. This white paper will give you the information you need to get started. It won’t make you a security expert, but it will put you on the right path in far less time than it would take if you were to begin on your own.

We began by condensing the material from numerous industry standards and best practice documents. Then we combined our experience in assessing the security of dozens of industrial control systems. The result is an easy-to-follow 7-step process:

Step 1 – Assess Existing Systems
Step 2 – Document Policies & Procedures
Step 3 – Train Personnel & Contractors
Step 4 – Segment the Control System Network Step 5 – Control Access to the System
Step 6 – Harden the Components of the System Step 7 – Monitor & Maintain System Security

The remainder of this white paper will walk through each of these steps, explaining the importance of each step and best practices for implementing it. We will also provide ample references for additional information

Download PDF   

With the ever changing threats posed by cyber events of any nature, it has become critical to recognize these emerging threats, malicious or not, and identify the consequences these threats may have on the operation of an industrial control system (ICS). Cyber-attacks over time have the ability to take on many forms and threaten not only industrial but also national security.

Saudi Aramco, the world’s largest exporter of crude oil, serves as a perfect example depicting how devastating a cyber-attack can truly be on an industrial manufacturer. In August 2012, Saudi Aramco (SA) had 30,000 personal computers on its network infected by a malware attack better known as the “Shamoon” virus. According to InformationWeek Security this was roughly 75 percent of the company’s workstations and took 10 days to complete clean-up efforts.

The seriousness of cyber-attacks in regards to national security was addressed by former United States Secretary of Defense Leon W. Panetta in his speech on October 2012. Panetta issued a strong warning to business executives about cybersecurity as it relates to national security.” A cyber-attack perpetrated by nation states [and] violent extremists groups could be as destructive as the terrorist attack on 9/11. Such a destructive cyber-terrorist attack could virtually paralyze the nation,” he stated. “For example, we know that foreign cyber actors are probing America’s critical infrastructure networks. They are targeting the computer control systems that operate chemical, electricity and water plants and those that guide transportation throughout this country.”

In addition to Panetta’s address, the U.S. Department of Homeland Security has issued several alerts about coordinated attacks on gas pipeline operators, according to a May 2012 report by ABC News.

This whitepaper will focus on the significance of cyber-attacks on industrial control systems (ICS) and how these attacks can be prevented by proper practice of the ICS Cybersecurity lifecycle.

Download PDF   

Failure Rate Data:

Probabilistic calculations that are done to verify the integrity of a Safety Instrumented Function design require failure rate and failure mode data of all equipment including the mechanical devices. For many devices, such data is only available in industry databases where only failure rates are presented. The failure mode information is rare, if available at all. Many give up and just say 50% safe and 50% dangerous thinking this is conservative. In some cases this is not a conservative assumption. In other cases it can be an over-kill.

Download PDF   

Performance based functional safety standards like IEC 61511 offer many advantages including the opportunity to optimize and upgrade Safety Instrumented System (SIS) designs. But performance calculation depends upon realistic failure data for instruments used. A predictive analysis technique called Failure Modes Effects and Diagnostic Analysis (FMEDA) has been developed along with a component failure rate database that can predict failure rates of instruments based on their design strength and the expected stress environment. This method has been calibrated with over 150 billion unit operating hours of field failure data over the last 15 years.

Download PDF   

Estimated Failure Rates for Sensor and Valve Assemblies 

Failure rates predicted by Failure Modes Effects and Diagnostic Analysis (FMEDA) are compared to failure rates estimated from the Offshore Reliability Data (OREDA) project for sensor and valve assemblies. Because the two methods of data analysis are fundamentally different in nature, it may be surprising that, when appropriately compared, the results from the two methods are generally quite similar. The nature of the published data for FMEDA and OREDA is explored. The relative merits of each method are discussed. 

Download PDF   

In this paper, we present a methodology to derive component failure rate and failure mode data for mechanical components used in automation systems based on warranty and field failure data as well as expert opinion. We describe a process for incorporating new component information into the database as it becomes available. The method emphasizes random mechanical component failures of importance in the world of safety analysis as opposed to the wear-out and aging mechanical failures that have dominated mechanical reliability analysis. The method provides a level of accuracy significantly better than warranty failure data analysis alone. The derived database has the same form as that for electrical/electronics databases used in FMEDA analyses used to show compliance with international performance-based safety standards. Thus, the mechanical database can be used in conjunction with existing electrical/electronics databases to perform required probabilistic safety analysis on automation systems comprised of both electrical and mechanical components.

Download PDF   

This white paper describes the distinction between failure rate prediction and estimation methods in general and then gives an overview of the procedures used to obtain dangerous failure rates for certain mechanical equipment using exida FMEDA predictions and OREDA estimations. exida frequently compares field failure rate data from various sources to FMEDA results in order to validate the FMEDA component library. However, because OREDA and FMEDA methods are quite different, it is not possible to compare their results directly. A methodology is presented which creates predictions and estimations that are more comparable. The methodology is then applied to specific equipment combinations and the results are compared. When differences in the results exist between the two methods, plausible explanations for the differences are provided.

The comparisons show that the OREDA failure rates are well within the range of the exida FMEDA results. The comparisons also show that, with two exceptions, the average FMEDA predictions for dangerous failure rates are only slightly less than those of the OREDA estimations. In those two exceptions, FMEDA predictions are higher than OREDA. Therefore, it is reasonable to conclude that, when compared in an “apples-to-apples” fashion, for the equipment analyzed in this paper, the exida FMEDA predictions and OREDA estimations are quite comparable.

Download PDF   

There are many benefits to a company when they have access to good field failure data. Most of the benefits are categorized as saving money. At the same time, most of the expenditure to get good failure data is already being spent. Given an incremental cost of improving data collection quality and better data analysis, the nice benefits could be achieved.

Good high quality field failure data has often been described as the ultimate source of failure data. However, not all field failure studies are high quality. Some field studies simply do not have the needed information. Some field studies make unrealistic assumptions. The results can be quite different depending on methods and assumptions. Some methods produce optimistic results that can result in bad designs and unsafe processes.

This paper presents some common field failure analysis techniques, shows some of the limitations of the methods and describes important attributes of a good field failure data collection system.

Download PDF   

The use of IEC 61508 [1] and IEC 61511 [2] has increased rapidly in the past several years. Along with the adoption of the standards has come an increase in the need for accurate reliability data for devices used in Safety Instrumented Systems (SIS), both electronic and mechanical. While the methodology of determining failure rates for electronic equipment is fairly well accepted and applied, the same can not be said for mechanical equipment. Several methods are currently being utilized for generating failure rates for mechanical components. These methods vary in their approach and often lead to dramatically different failure rates which can lead to significant differences when calculating the reliability of a safety instrumented function (SIF). Some methods can result in dangerously optimistic failure rate numbers.

This paper reviews the methods utilized to determine mechanical reliability for components utilized in safety systems and provides a recommendation for the most appropriate methodology.

Download PDF   

This paper highlights the similarities and differences between the safety parameters which result from OREDA and FMEDA data analysis methods. The statistical parameter estimations performed by OREDA (which are based on expert analysis of field failure data (FFD) from offshore and onshore topside oil exploration and production facilities as well as offshore subsea facilities) and the parameter predictions produced by FMEDA (which derives failure parameters based on validated failure rate and failure mode databases of components that comprise functional safety related devices) are described. Samples of the formats of published data resulting from each method are presented and explained. The capabilities of each method for dealing with application specific safety issues are considered; these application issues include the common scenarios of close on trip vs open on trip safety system configurations. The differences in the methods’ capabilities for analyzing the safety of a specific device vs. a generalized subsystem or system are reviewed. A table comparing these as well as other characteristics is presented. The relative advantages and disadvantages of each analysis method are discussed and attention is drawn to the complementary nature of the data produced by the two methods.

Download PDF   

FMEDA:

The letters FMEDA form an acronym for “Failure Modes Effects and Diagnostic Analysis.” The name was given by one of the authors in 1994 to describe a systematic analysis technique that had been in development since 1988 to obtain subsystem / product level failure rates, failure modes and diagnostic capability (Figure 1).

FMEDA

Figure 1: FMEDA Inputs and Outputs.

The FMEDA technique considers:

  • All components of a design,
  • The functionality of each component,
  • The failure modes of each component,
  • The impact of each component failure mode on the product functionality,
  • The ability of any automatic diagnostics to detect the failure,
  • The design strength (de-rating, safety factors) and
  • The operational profile (environmental stress factors).

 

Download PDF   

Safety deviation is a term used in functional safety. Safety deviation (formerly safety accuracy) is the change in output due to (internal) component failures not analyzed in a Failure Modes, Effects, & Diagnostic Analysis (FMEDA). Safety accuracy is an input to the FMEDA analyst to advise the level of analysis detail for critical analog componentsThe term is defined, some of its history is described, the reasoning for its existence is given, and its application is presented.

Download PDF   

Functional Safety Certification:

Hoje existe uma crescente tendência que os usuários finais optem por fabricantes que tenham equipamentos certificados conforme a norma IEC 61508 (SIL – Safety Integrity Level). Esta é uma excelente tendência por diversos motivos. O primeiro é porque para se obter a certificação, os fabricantes necessariamente devem fornecer as taxas de falha e modos de falha do produto. Essas informações são normalmente obtidas a partir de um relatório conhecido como análise dos modos de falha, efeitos e diagnósticos (Failure Modes Effects and Diagnostic Analysis - FMEDA).

Download PDF   

Today there is a growing trend by end-users to require equipment manufacturers to get their safety devices IEC 61508 (SIL) Certified. That is an excellent trend for a number of reasons. One reason is because in order to get a device SIL Certified, a company must first determine the device’s failure rates and failure modes. This is usually done by having a Failure Modes Effects and Diagnostic Analysis, (FMEDA) performed. Among other things, an FMEDA Report will detail the device’s Architectural Constraints and its λDU (Dangerous Undetected Failure Rate). With any given values for maintenance parameters, (Test Interval, Test Coverage, and Repair Time), you can determine the device’s PFDavg (Average Probability of Failure on Demand ). Both the Architectural Constraints and the PFDavg of a device, together with its IEC 61508 Certification, are critical in evaluating whether or not a given device may be suitable for use in a Safety Function with a given SIL requirement. And both of these characteristics, together with IEC 61508 Certification, are what concern a Safety Engineer in his evaluation.

Download PDF   

An updated version of IEC 61508, Functional Safety of Electrical/Electronic and Programmable Electronic Systems was issued in September 2010. This Second Edition is generally thought to clarify common interpretations of the first edition and add some refinements that had accumulated in the ten years since the first edition.  This white paper compares certification under the 1st and 2nd editions.

Download PDF   

This document is intended for readers who are familiar with the international safety standard IEC 61508 [Ref. 1] in general and with that document’s Part 7: Annex D [Ref. 2] in particular. As currently written, Annex D provides “initial guidelines on the use of a probabilistic approach to determining safety integrity for pre-developed software” (SW) included in safety instrumented functions. It further states that “the annex provides an indication of what is possible, but the techniques should be used only by those who are competent in statistical analysis.” If these guidelines are to be used effectively in the testing and certification of safety-related SW it is essential that individuals involved in testing and certifying such SW understand how to interpret these guidelines correctly. To this end, this document explains the possibilities and limitations inherent in the information contained in IEC61508-7 Annex D.

Download PDF   

International safety standard IEC 61508‐7 Annex D prescribes sampling sizes of safety critical software (SW) inputs needed to be consecutively processed correctly in order to ascertain that the SW meets a certain safety integrity level (SIL) with a certain statistical confidence level. The sample sizes in Annex D Table D.1 are derived from a Bernoulli sampling model which requires that the sampled inputs be uniformly distributed.

The simulations reported in this paper are intended to answer the question: If one uses the sample sizes as prescribed in IEC 61508‐7 Annex D but the sampled safety critical SW inputs are not uniformly distributed, do the confidence levels in Table D.1 still hold? The answer is NO. When the sampled safety critical SW inputs are not uniformly distributed, the confidence levels attained depend not only on the sample size but also on the distributions of both the sampled safety critical SW inputs and the distribution of those safety critical SW inputs that will not be correctly processed. Since it is impossible to know the distribution of safety critical SW inputs that will not be correctly processed, it is impossible to know the confidence levels attained if the sampled safety critical inputs are not uniformly distributed. Consequently, SW cannot be safety certified according to Annex D unless the SW tester can demonstrate that the safety critical SW inputs used in the tests were uniformly distributed.

Download PDF   

Electromagnetic Interference (EMI) is just one of the environmental stresses that can stop a system from performing its safety function. It is important for a functional safety system to be immune from the EMI levels that are likely to be present. Unlike other environmental stresses, like temperature and vibration, EMI is more difficult to sense and is more likely to be transitory. Still, the effects can be catastrophic.

Download PDF   

IEC 61508 has been in use for several years since the final parts were released in 2000. Although written from the perspective of a bespoke system, it is more commonly used to certify products for a given SIL level. Valid product certification schemes must involve the assessment of specific product design details as well as an assessment of the safety management system of the product manufacturer and the personnel competency of those professionals involved in the product creation.

A proper assessment of a product must completely cover all the requirements of the IEC 61508 standard including the safety management system and build a safety case. The safety case must list each requirement, an argument as to how the product design or its creation process meets the requirement and the necessary evidence to provide reasonable credibility for the argument. This safety case must be available for inspection. Although the safety case typically contains manufacturer proprietary information, those who wish to review the full safety case should be able to do so, perhaps under confidentiality agreement. In addition, an open IEC 61508 certification must include a public certification report that provides an overview of the assessment and the product limitations, if any.

This paper describes an assessment technique for product designs and the product development process that produces a full safety case as well as additional public documentation. This “open certification” method has been used in dozens of instances on product design process. The assessment experiences to date show that most of the problems with conventional methods are solved or at least improved.

Download PDF   

In today’s world many potentially dangerous pieces of equipment are controlled by embedded software. This equipment includes cars, trains, airplanes, oil refineries, chemical processing plants, nuclear power plants and medical devices. As embedded software becomes more pervasive so too do the risks associated with it. As a result, the issue of software safety has become a very hot topic in recent years. The leading international standard in this area is IEC 61508: Functional safety of electrical/electronic/ programmable electronic safety-related systems. This standard is generic and not specific to any industry, but has already spun off a number of industry specific derived standards, and can be applied to any industry that does not have its own standard in place. Several industry specific standards such as EN50128 (Railway), DO-178B (Aerospace), IEC 60880 (Nuclear) and IEC 601-1-4 (Medical Equipment), are already in place. Debra Herrmann (Herrmann, 1999) has found a total of 19 standards related to software safety and reliability cut across industrial sectors and technologies. These standards’ popularity is on the rise, and more and more embedded products are being developed that conform to these standards. Since an increasing number of embedded products also use an embedded real time operating system (RTOS), it has become inevitable that products with an RTOS are being designed to conform to such standards. This creates an important question for designers: how is my RTOS going to effect my certification? This article will attempt to explore the challenges and advantages of using an RTOS in products that will undergo certification.

Download PDF   

Safety deviation is a term used in functional safety. Safety deviation (formerly safety accuracy) is the change in output due to (internal) component failures not analyzed in a Failure Modes, Effects, & Diagnostic Analysis (FMEDA). Safety accuracy is an input to the FMEDA analyst to advise the level of analysis detail for critical analog componentsThe term is defined, some of its history is described, the reasoning for its existence is given, and its application is presented.

Download PDF   

In performance based functional safety standards, safety function designs are verified using specified metrics. A key metric for process industry designs is called average Probability of Failure on Demand (PFDavg). After several studies of many field failure and proof test reports, several variables have been identified as key to a realistic PFDavg calculation. Most simplified equations including the informative section in IEC 61508, Part 6 do not include several key variables. It is shown that exclusion of these parameters may result in an optimistic metric calculation which may result in an unsafe design.

This paper identifies the key variables that need to be included in a PFDavg calculation and provides some simplified equations showing the impact of most variables. An example showing two sets of variables reveals an entire SIL level difference in PFDavg calculation results.

Download PDF   

Functional Safety Lifecycle:

Em diversas edições da Revista InTech América do Sul foram publicados, por vários autores, artigos sobre Sistemas Instrumentados de Segurança e as normas internacionais que norteiam as melhores práticas aplicadas a tais projetos. Agora chegou a vez de falar sobre as normas brasileiras!

Download PDF   

Accurate Modeling of Shared Components in High Reliability Applications

Download PDF   

How can a company establish a baseline measurement of its safety culture against which to gauge improvement? The Site Safety IndexTM (SSI) quantifies in part (on a scale of 0 – 4) the degree to which a company’s end-user practices support the attainment and retention of an appropriate safety culture for operations and maintenance. Further, the SSI can be used to appropriately adjust parameters that directly impact measures of safety such as probability of failure on demand (PFDavg). Thus, the impacts of safety culture can be further quantified and the effects of changes to safety culture can be assessed. 

Download PDF   

There are many benefits to a company when they have access to good field failure data. Most of the benefits are categorized as saving money. At the same time, most of the expenditure to get good failure data is already being spent. Given an incremental cost of improving data collection quality and better data analysis, the nice benefits could be achieved.

Good high quality field failure data has often been described as the ultimate source of failure data. However, not all field failure studies are high quality. Some field studies simply do not have the needed information. Some field studies make unrealistic assumptions. The results can be quite different depending on methods and assumptions. Some methods produce optimistic results that can result in bad designs and unsafe processes.

This paper presents some common field failure analysis techniques, shows some of the limitations of the methods and describes important attributes of a good field failure data collection system.

Download PDF   

Alarms and operator response are one of the first layers of defense in preventing a plant upset from escalating into an abnormal situation. The new ISA 18.2 standard on alarm management recommends following a lifecycle approach similar to the existing ISA84/IEC 61511 standard on functional safety. This paper will highlight where these lifecycles interact and overlap, as well as how to address them holistically. Specific examples within ISA 18 will illustrate where the output of one lifecycle is used as input to the other, such as when alarms identified as a safeguards during a process hazards analysis (PHA) are used as an input to alarm identification and rationalization. The paper will also provide recommendations on how to integrate the safety and alarm management lifecycles.

Download PDF   

The release of IEC 61508 2010 has led to several discussions on how certain new, updated, and unmodified definitions need to be interpreted. The controversy relates to the determination of the required minimum hardware fault tolerance / architectural constraints interpretation.

This position paper explains the position that exida has taken with regard to this issue. The position paper is structured in two parts; the position and the Rationale for the position including counter arguments received over the last couple of months. The exida position is also implemented in the exida exSILentia safety lifecycle tool.

Download PDF   

Prior Use (Proven In Use in 61508) equipment is when a documented assessment has shown that there is appropriate evidence, based on the previous use of the component, that the component is suitable for use in a particular application of a safety instrumented system at a given integrity level. There are two dimensions to this issue, suitability of a component to meet application requirements and sufficient integrity for safety critical functions.

Both international functional safety standards, IEC 61508 and ISA84.01-2004 (IEC 61511 MOD.) have clauses describing Prior Use requirements. One should also refer to TR84.00.04 for additional information on Prior Use.

This paper will discuss some of the particular roadblocks found at the plant site in trying to meet Prior Use guidelines in the ISA84.01-2004 (IEC 61511 MOD.) standard. This paper will not discuss the process for assessing the equipment.

Download PDF   

Dr Peter Clarke explains how process plants can benefit through proper and careful adoption of the IEC 61511 safety standard.

Download PDF   

Alarms and operator response to them are one of the first layers of protection in preventing a plant upset from escalating into a hazardous event. This paper discusses how to evaluate and maximize the risk reduction (or minimize the probability of failure on demand) of this layer when it is considered as part of a layer of protection analysis (LOPA).

The characteristics of a valid layer of protection (Specific, Auditable, Independent and Dependable) will be reviewed to examine how each applies to alarms and operator response. Considerations for how to assign probability of failure on demand (PFD) will be discussed, including the key factors that contribute to it (e.g., operator’s time to respond, training, human factors, and the reliability of the alarm annunciation / system response). The effect of alarm system performance issues (such as nuisance alarms and alarm floods) on operator dependability (and probability of failure on demand) will be reviewed. Key recommendations will be drawn from the ISA-18.2 standard “Management of Alarm Systems for the Process Industries”.

Download PDF   

Fault tolerant systems have been designed for safety critical applications including the protection of potentially dangerous industrial processes. These systems are typically evaluated and certified to functional safety standards with IEC 61508 [1] by agencies like exida Certification or one of the TUV companies. Many factors are taken into account during the certification process including hardware diagnostic capability, level of hardware redundancy, design processes used, software diagnostics and general equipment strength. It has clearly become recognized that common cause failures can have a major negative impact on the safety and availability of a fault tolerant system [2]. The whole value of redundancy may be ruined. Common cause is recognized as an important factor but there is disagreement regarding how to account for common cause in the quantitativemodeling. Part 7 of IEC 61508 at least provides a list of questions with a point scoring system [3].

Download PDF   

The functional safety standards, IEC 61508, IEC 61511, and ANSI/ISA 84.01 each specify the Safety Integrity Level performance parameter for Safety Instrumented Functions. For a Safety Instrumented Function to meet a specific Safety Integrity Level the sum of the average Probability of Failure on Demand (PFDavg) of all components, part of that Safety Instrumented Function, needs to fall in the PFDavg bandwidth related to that Safety Integrity Level.

Download PDF   

Operations & Maintenance:

There are many benefits to a company when they have access to good field failure data. Most of the benefits are categorized as saving money. At the same time, most of the expenditure to get good failure data is already being spent. Given an incremental cost of improving data collection quality and better data analysis, the nice benefits could be achieved.

Good high quality field failure data has often been described as the ultimate source of failure data. However, not all field failure studies are high quality. Some field studies simply do not have the needed information. Some field studies make unrealistic assumptions. The results can be quite different depending on methods and assumptions. Some methods produce optimistic results that can result in bad designs and unsafe processes.

This paper presents some common field failure analysis techniques, shows some of the limitations of the methods and describes important attributes of a good field failure data collection system.

Download PDF   

It is not difficult to think of numerous ways in which human factors might impact functional safety. For example, imperfect repair, improper calibration, faulty installation, etc. can all contribute to decreases in functional safety performance. However, if actions are to be taken to mitigate these impacts then it is essential to have some quantitative measure of the impacts so that the actions’ effectiveness can be assessed. This paper describes how the Site Safety IndexTM (SSI) is used to adjust safety metrics, computed under the assumptions that human factors play no part in safety system performance, to reflect the effects of human factors on safety system performance on a site by site basis. Examples of the application of the method showing the safety performance changes attributable to human factors are provided. 

Download PDF   

The purpose of this document is to report on our successful efforts to validate
statistically certain random equipment failure rate data used in a mechanical
parts failure rate and failure mode database and, by extension, to validate the
techniques used to derive the data. To accomplish this, a Failure Modes,
Effects, and Diagnostic Analysis (FMEDA) is initially used to predict the usefullife
failure rate for the fail-to-open condition of a particular pressure relief valve
(PRV) using the failure rates from the mechanical parts database. Next, this
prediction is statistically tested against three independent data sets consisting of
proof test data for PRV provided by Fortune 500 operating companies. The data
sets all meet the intent of the quality assurance of proof test data as documented
by the Center for Chemical Process Safety (CCPS) Process Equipment
Reliability Database (PERD) initiative. By applying the quantal response method
to the results of these PRV proof tests, it is demonstrated that the proof test data
are consistent with the predictions of the FMEDA. Specifically, all of the data
sets support the FMEDA result at a 95% confidence level. All analyses lead to a
useful-life PRV failure rate between 10-8 and 10-7 failures/hour.

It is very important to note that the results of this study cannot be used to
justify extension of proof test intervals beyond the useful life of the PRV. The
small value of the failure rate derived from the FMEDA applies only to the useful
life of the PRV which depends not only on the equipment’s specifications but also
on other factors, such as the ambient and process environment in which the PRV
is used and the levels and frequency of any on-line maintenance performed.
Data analyses place useful life in the range of 4 to 5 years.

Finally, we note that the results of the statistical analyses of the three
independent data sets predict an initial failure probability of approximately 1% –
1.6%. This initial failure probability is extremely significant as it accounts for the
vast majority of failures observed in proof test. This emphasizes the value of
careful installation and thorough commissioning procedures. When
commissioning testing cannot be done after installation, as is the case with a
PRV, both the initial probability of failing to open, as well as the PFD based upon
the random failure rate must be taken into account in the risk analysis.

Download PDF   

In order to assign a SIL to equipment in low demand applications, we must be able to compute PFDavg. To compute PFDavg, we must first have a model for λD(t), the failure rate of the equipment in the dangerous failure mode. A dangerous failure occurs when equipment designed for prevention or mitigation of an unsafe condition cannot properly respond to the unsafe condition, i.e., the equipment fails on demand. For example, consider a PRV, which, in normal operation, is closed. Should it fail in the “stuck-shut” mode, it would be in a state of dangerous failure as it would be unable to respond to an overpressure event if one occurred.

Download PDF   

This paper introduces a benchmarking technique we call Predictive Analytics (PA). Using a large set of data from FMEDA results in combination with validated field failure data, upper and lower reasonability bounds for instrument device failure rates can be established. These bounds can be used as a benchmark to help validate any data collection process. This benchmark represents the constant total failure rate (λ) inherent in the device during its useful life. For any given set of field failure data (FFD) for a device, a λ of the device is estimated and compared to the benchmark. It is not uncommon for the benchmark λ and estimated λ to differ considerably. PA provides a procedure for exploring explanations of these differences and assessing the accuracy of the estimated device λ with respect to the benchmark λ of the device. PA can often determine the source of that portion of the estimated λ value not inherent to the device but likely due to random failures of infant mortality, wear out, or initial failures, to systematic failures, or to application or site specific issues. This site specific device λ is the portion of the estimated λ the end user needs to address to improve operational reliability and safety. PA can also assess the quality of FFD and can facilitate the discovery of previously unknown device failure modes.

Download PDF