Alarm Management:

Poor alarm management is one of the leading causes of unplanned downtime, contributing to over $20B in lost production every year, and of major industrial incidents such as the one in Texas City. Developing good alarm management practices is not a discrete activity, but more of a continuous process (i.e., it is more of a journey than a destination). This paper will describe the new ISA-18.2 standard -“Management of Alarm Systems for the Process Industries”. This standard provides a framework and methodology for the successful design, implementation, operation and management of alarm systems and will allow end-users to address one of the fundamental conclusions of Bransby and Jenkinson that “Poor performance costs money in lost production and plant damage and weakens a very important line of defense against hazards to people.” Following a lifecycle model will help users systematically address all phases of the journey to good alarm management. This paper will provide an overview of the new standard and the key activities that are contained in each step of the lifecycle.

Download PDF   

Alarms and operator response are one of the first layers of defense in preventing a plant upset from escalating into an abnormal situation. The new ISA 18.2 standard on alarm management recommends following a lifecycle approach similar to the existing ISA84/IEC 61511 standard on functional safety. This paper will highlight where these lifecycles interact and overlap, as well as how to address them holistically. Specific examples within ISA 18 will illustrate where the output of one lifecycle is used as input to the other, such as when alarms identified as a safeguards during a process hazards analysis (PHA) are used as an input to alarm identification and rationalization. The paper will also provide recommendations on how to integrate the safety and alarm management lifecycles.

Download PDF   

Using the ISA-18.2 standard can help process engineers understand, simplify, and implement a sustainable alarm management program.

Congratulations. You’ve been assigned the task of establishing an alarm management program for your facility. So where and how do you begin? This article presents four practical tips for starting an effective and sustainable alarm management program that
conforms to the tenets of a relatively new process industry standard for alarm management published by ISA.

Download PDF   

Alarms and operator response to them are one of the first layers of protection in preventing a plant upset from escalating into a hazardous event. This paper discusses how to evaluate and maximize the risk reduction (or minimize the probability of failure on demand) of this layer when it is considered as part of a layer of protection analysis (LOPA).

The characteristics of a valid layer of protection (Specific, Auditable, Independent and Dependable) will be reviewed to examine how each applies to alarms and operator response. Considerations for how to assign probability of failure on demand (PFD) will be discussed, including the key factors that contribute to it (e.g., operator’s time to respond, training, human factors, and the reliability of the alarm annunciation / system response). The effect of alarm system performance issues (such as nuisance alarms and alarm floods) on operator dependability (and probability of failure on demand) will be reviewed. Key recommendations will be drawn from the ISA-18.2 standard “Management of Alarm Systems for the Process Industries”.

Download PDF   

Setting a new Standard for Performance, Safety, and Reliability with ISA-18.2

Alarm Management affects both the bottom line and plant safety. A well- functioning alarm system can help a process run closer to its ideal operating point – leading to higher yields, reduced production costs, increased throughput, and higher quality, all of which add up to higher profits. Poor alarm management, on the other hand, is one of the leading causes of unplanned downtime and has been a major contributor to some of the worst industrial safety accidents on record.

Download PDF   

Certification:

Hoje existe uma crescente tendência que os usuários finais optem por fabricantes que tenham equipamentos certificados conforme a norma IEC 61508 (SIL – Safety Integrity Level). Esta é uma excelente tendência por diversos motivos. O primeiro é porque para se obter a certificação, os fabricantes necessariamente devem fornecer as taxas de falha e modos de falha do produto. Essas informações são normalmente obtidas a partir de um relatório conhecido como análise dos modos de falha, efeitos e diagnósticos (Failure Modes Effects and Diagnostic Analysis - FMEDA).

Download PDF   

Today there is a growing trend by end-users to require equipment manufacturers to get their safety devices IEC 61508 (SIL) Certified. That is an excellent trend for a number of reasons. One reason is because in order to get a device SIL Certified, a company must first determine the device’s failure rates and failure modes. This is usually done by having a Failure Modes Effects and Diagnostic Analysis, (FMEDA) performed. Among other things, an FMEDA Report will detail the device’s Architectural Constraints and its λDU (Dangerous Undetected Failure Rate). With any given values for maintenance parameters, (Test Interval, Test Coverage, and Repair Time), you can determine the device’s PFDavg (Average Probability of Failure on Demand ). Both the Architectural Constraints and the PFDavg of a device, together with its IEC 61508 Certification, are critical in evaluating whether or not a given device may be suitable for use in a Safety Function with a given SIL requirement. And both of these characteristics, together with IEC 61508 Certification, are what concern a Safety Engineer in his evaluation.

Download PDF   

Fault tolerant systems have been designed for safety critical applications including the protection of potentially dangerous industrial processes. These systems are typically evaluated and certified to functional safety standards with IEC 61508 [1] by agencies like exida Certification or one of the TUV companies. Many factors are taken into account during the certification process including hardware diagnostic capability, level of hardware redundancy, design processes used, software diagnostics and general equipment strength. It has clearly become recognized that common cause failures can have a major negative impact on the safety and availability of a fault tolerant system [2]. The whole value of redundancy may be ruined. Common cause is recognized as an important factor but there is disagreement regarding how to account for common cause in the quantitative
modeling. Part 7 of IEC 61508 at least provides a list of questions with a point scoring system [3].

Download PDF   

IEC 61508 has been in use for several years since the final parts were released in 2000. Although written from the perspective of a bespoke system, it is more commonly used to certify products for a given SIL level. Valid product certification schemes must involve the assessment of specific product design details as well as an assessment of the safety management system of the product manufacturer and the personnel competency of those professionals involved in the product creation.

A proper assessment of a product must completely cover all the requirements of the IEC 61508 standard including the safety management system and build a safety case. The safety case must list each requirement, an argument as to how the product design or its creation process meets the requirement and the necessary evidence to provide reasonable credibility for the argument. This safety case must be available for inspection. Although the safety case typically contains manufacturer proprietary information, those who wish to review the full safety case should be able to do so, perhaps under confidentiality agreement. In addition, an open IEC 61508 certification must include a public certification report that provides an overview of the assessment and the product limitations, if any.

This paper describes an assessment technique for product designs and the product development process that produces a full safety case as well as additional public documentation. This “open certification” method has been used in dozens of instances on product design process. The assessment experiences to date show that most of the problems with conventional methods are solved or at least improved.

Download PDF   

In today’s world many potentially dangerous pieces of equipment are controlled by embedded software. This equipment includes cars, trains, airplanes, oil refineries, chemical processing plants, nuclear power plants and medical devices. As embedded software becomes more pervasive so too do the risks associated with it. As a result, the issue of software safety has become a very hot topic in recent years. The leading international standard in this area is IEC 61508: Functional safety of electrical/electronic/ programmable electronic safety-related systems. This standard is generic and not specific to any industry, but has already spun off a number of industry specific derived standards, and can be applied to any industry that does not have its own standard in place. Several industry specific standards such as EN50128 (Railway), DO-178B (Aerospace), IEC 60880 (Nuclear) and IEC 601-1-4 (Medical Equipment), are already in place. Debra Herrmann (Herrmann, 1999) has found a total of 19 standards related to software safety and reliability cut across industrial sectors and technologies. These standards’ popularity is on the rise, and more and more embedded products are being developed that conform to these standards. Since an increasing number of embedded products also use an embedded real time operating system (RTOS), it has become inevitable that products with an RTOS are being designed to conform to such standards. This creates an important question for designers: how is my RTOS going to effect my certification? This article will attempt to explore the challenges and advantages of using an RTOS in products that will undergo certification.

Download PDF   

Prior Use (Proven In Use in 61508) equipment is when a documented assessment has shown that there is appropriate evidence, based on the previous use of the component, that the component is suitable for use in a particular application of a safety instrumented system at a given integrity level. There are two dimensions to this issue, suitability of a component to meet application requirements and sufficient integrity for safety critical functions.

Both international functional safety standards, IEC 61508 and ISA84.01-2004 (IEC 61511 MOD.) have clauses describing Prior Use requirements. One should also refer to TR84.00.04 for additional information on Prior Use.

This paper will discuss some of the particular roadblocks found at the plant site in trying to meet Prior Use guidelines in the ISA84.01-2004 (IEC 61511 MOD.) standard. This paper will not discuss the process for assessing the equipment.

Download PDF   

The exida IEC 61508 Certification Program was established in 2005 in response to demand primarily from end users in the process industries and manufacturers of instrumentation products. There was a need to provide a higher quality of technical expertise with effective and responsive service for these manufacturers.

The exida IEC 61508 Certification Program offers the most comprehensive product review of any certification agency resulting in products that are safer, more secure and more reliable.

Download PDF   

Cybersecurity:

The past two years have been a wakeup call for the industrial automation industry. It has been the target of sophisticated cyber attacks like Stuxnet, Night Dragon and Duqu. An unprecedented number of security vulnerabilities have been exposed in industrial control products and regulatory agencies are demanding compliance to complex and confusing regulations. Cyber security has quickly become a serious issue for professionals in the process and critical infrastructure industries.

If you are a process control engineer, an IT professional in a company with an automation division, or a business manager responsible for safety or security, you may be wondering how your organization can get moving on more robust cyber security practices. This white paper will give you the information you need to get started. It won’t make you a security expert, but it will put you on the right path in far less time than it would take if you were to begin on your own.

We began by condensing the material from numerous industry standards and best practice documents. Then we combined our experience in assessing the security of dozens of industrial control systems. The result is an easy-to-follow 7-step process:

Step 1 – Assess Existing Systems
Step 2 – Document Policies & Procedures
Step 3 – Train Personnel & Contractors
Step 4 – Segment the Control System Network Step 5 – Control Access to the System
Step 6 – Harden the Components of the System Step 7 – Monitor & Maintain System Security

The remainder of this white paper will walk through each of these steps, explaining the importance of each step and best practices for implementing it. We will also provide ample references for additional information

Download PDF   

The exida IEC 61508 Certification Program was established in 2005 in response to demand primarily from end users in the process industries and manufacturers of instrumentation products. There was a need to provide a higher quality of technical expertise with effective and responsive service for these manufacturers.

The exida IEC 61508 Certification Program offers the most comprehensive product review of any certification agency resulting in products that are safer, more secure and more reliable.

Download PDF   

With the ever changing threats posed by cyber events of any nature, it has become critical to recognize these emerging threats, malicious or not, and identify the consequences these threats may have on the operation of an industrial control system (ICS). Cyber-attacks over time have the ability to take on many forms and threaten not only industrial but also national security.

Saudi Aramco, the world’s largest exporter of crude oil, serves as a perfect example depicting how devastating a cyber-attack can truly be on an industrial manufacturer. In August 2012, Saudi Aramco (SA) had 30,000 personal computers on its network infected by a malware attack better known as the “Shamoon” virus. According to InformationWeek Security this was roughly 75 percent of the company’s workstations and took 10 days to complete clean-up efforts.

The seriousness of cyber-attacks in regards to national security was addressed by former United States Secretary of Defense Leon W. Panetta in his speech on October 2012. Panetta issued a strong warning to business executives about cybersecurity as it relates to national security.” A cyber-attack perpetrated by nation states [and] violent extremists groups could be as destructive as the terrorist attack on 9/11. Such a destructive cyber-terrorist attack could virtually paralyze the nation,” he stated. “For example, we know that foreign cyber actors are probing America’s critical infrastructure networks. They are targeting the computer control systems that operate chemical, electricity and water plants and those that guide transportation throughout this country.”

In addition to Panetta’s address, the U.S. Department of Homeland Security has issued several alerts about coordinated attacks on gas pipeline operators, according to a May 2012 report by ABC News.

This whitepaper will focus on the significance of cyber-attacks on industrial control systems (ICS) and how these attacks can be prevented by proper practice of the ICS Cybersecurity lifecycle.

Download PDF   

Functional Safety:

The typical first reaction from the process operations side of the table when confronted with a new standard is, “How much will this cost and how much extra paperwork will it involve?” Depending on the organisation, the answers to these questions can vary dramatically. Unfortunately, the further question, “How can this save money?” is rarely asked, if ever. Even if it is asked, the hope of implementing a new regulation and actually saving money immediately is dismissed as an impossible dream. IEC/AS 61508 and 61511, the standards covering the design and use of safety instrumented systems to reduce process plant accidents, are no exception to this initial reaction.

Download PDF   

Em diversas edições da Revista InTech América do Sul foram publicados, por vários autores, artigos sobre Sistemas Instrumentados de Segurança e as normas internacionais que norteiam as melhores práticas aplicadas a tais projetos. Agora chegou a vez de falar sobre as normas brasileiras!

Download PDF   

Probabilistic calculations that are done to verify the integrity of a Safety Instrumented Function design require failure rate and failure mode data of all equipment including the mechanical devices. For many devices, such data is only available in industry databases where only failure rates are presented. The failure mode information is rare, if available at all. Many give up and just say 50% safe and 50% dangerous thinking this is conservative. In some cases this is not a conservative assumption. In other cases it can be an over-kill.

Download PDF   

Accurate Modeling of Shared Components in High Reliability Applications

Download PDF   

In this paper, we present a methodology to derive component failure rate and failure mode data for mechanical components used in automation systems based on warranty and field failure data as well as expert opinion. We describe a process for incorporating new component information into the database as it becomes available. The method emphasizes random mechanical component failures of importance in the world of safety analysis as opposed to the wear-out and aging mechanical failures that have dominated mechanical reliability analysis. The method provides a level of accuracy significantly better than warranty failure data analysis alone. The derived database has the same form as that for electrical/electronics databases used in FMEDA analyses used to show compliance with international performance-based safety standards. Thus, the mechanical database can be used in conjunction with existing electrical/electronics databases to perform required probabilistic safety analysis on automation systems comprised of both electrical and mechanical components.

Download PDF   

There are many benefits to a company when they have access to good field failure data. Most of the benefits are categorized as saving money. At the same time, most of the expenditure to get good failure data is already being spent. Given an incremental cost of improving data collection quality and better data analysis, the nice benefits could be achieved.

Good high quality field failure data has often been described as the ultimate source of failure data. However, not all field failure studies are high quality. Some field studies simply do not have the needed information. Some field studies make unrealistic assumptions. The results can be quite different depending on methods and assumptions. Some methods produce optimistic results that can result in bad designs and unsafe processes.

This paper presents some common field failure analysis techniques, shows some of the limitations of the methods and describes important attributes of a good field failure data collection system.

Download PDF   

The letters FMEDA form an acronym for “Failure Modes Effects and Diagnostic Analysis.” The name was given by one of the authors in 1994 to describe a systematic analysis technique that had been in development since 1988 to obtain subsystem / product level failure rates, failure modes and diagnostic capability (Figure 1).

Download PDF   

Electromagnetic Interference (EMI) is just one of the environmental stresses that can stop a system from performing its safety function. It is important for a functional safety system to be immune from the EMI levels that are likely to be present. Unlike other environmental stresses, like temperature and vibration, EMI is more difficult to sense and is more likely to be transitory. Still, the effects can be catastrophic.

Download PDF   

There are many benefits to a company when they have access to good field failure data. Most of the benefits are categorized as saving money. At the same time, most of the expenditure to get good failure data is already being spent. Given an incremental cost of improving data collection quality and better data analysis, the nice benefits could be achieved.

Good high quality field failure data has often been described as the ultimate source of failure data. However, not all field failure studies are high quality. Some field studies simply do not have the needed information. Some field studies make unrealistic assumptions. The results can be quite different depending on methods and assumptions. Some methods produce optimistic results that can result in bad designs and unsafe processes.

This paper presents some common field failure analysis techniques, shows some of the limitations of the methods and describes important attributes of a good field failure data collection system.

Download PDF   

IEC 61508 is an international standard for the “functional safety” of electrical, electronic, and programmable electronic equipment. This standard started in the mid 1980s when the International Electrotechnical Committee Advisory Committee of Safety (IEC ACOS) set up a task force to consider standardization issues raised by the use of programmable electronic systems (PES). At that time, many regulatory bodies forbade the use of any software-based equipment in safety critical applications. Work began within IEC SC65A/Working Group 10 on a standard for PES used in safety-related systems. This group merged with Working Group 9 where a standard on software safety was in progress. The combined group treated safety as a system issue.

Download PDF   

IEC 61508 and its process-specific companion IEC 61511 are providing new codification to safety-instrumented systems and their application to the process industry. Setting the structure of the safety lifecycle and clarification of the safety integrity level generally enable companies to follow good engineering practice more readily. Experience shows that this clarification can uncover potential stumbling blocks in obtaining accurate failure rate data and insuring the competence of personnel involved throughout the safety lifecycle. Fortunately, solutions and guidance through these issues with improved databases, training, and official qualifications are becoming available via certification bodies and specialty consultants.

Download PDF   

The use of IEC 61508 [1] and IEC 61511 [2] has increased rapidly in the past several years. Along with the adoption of the standards has come an increase in the need for accurate reliability data for devices used in Safety Instrumented Systems (SIS), both electronic and mechanical. While the methodology of determining failure rates for electronic equipment is fairly well accepted and applied, the same can not be said for mechanical equipment. Several methods are currently being utilized for generating failure rates for mechanical components. These methods vary in their approach and often lead to dramatically different failure rates which can lead to significant differences when calculating the reliability of a safety instrumented function (SIF). Some methods can result in dangerously optimistic failure rate numbers.

This paper reviews the methods utilized to determine mechanical reliability for components utilized in safety systems and provides a recommendation for the most appropriate methodology.

Download PDF   

The release of IEC 61508 2010 has led to several discussions on how certain new, updated, and unmodified definitions need to be interpreted. The controversy relates to the determination of the required minimum hardware fault tolerance / architectural constraints interpretation.

This position paper explains the position that exida has taken with regard to this issue. The position paper is structured in two parts; the position and the Rationale for the position including counter arguments received over the last couple of months. The exida position is also implemented in the exida exSILentia safety lifecycle tool.

Download PDF   

In the past two decades only a small group of system vendors serving the nuclear, avionics, medical, railroad and process industry came in contact with requirements for Functional Safety of computerized systems. Now within the relatively short period of three years, the user requirements sections of many request for bids require engineering contractors and system suppliers world-wide to comply with the Functional Safety requirements of the international standard IEC 61508.

IEC 61508 has requirements for systems using complex electronics and programmable electronics whose failure could have an impact on the safety of persons and/or the environment. It describes methods to classify risk and specifies requirements on how to avoid, detect and control systematic design faults, particularly in software development, random hardware faults and common cause failures, and to a lesser extent operating and maintenance errors.

Download PDF   

Considering the components used in the current control systems, hardware failure causes have been widely studied. Software failure causes, on the other hand, are rarely studied or understood. In the field studies that have been done, some of the rules for software failure causes have been theorized but even those are not widely known by software engineers or followed. Few practitioners know the rules of software reliability or take the time to study how to create reliable software. Why? This is in part because it appears deceptively easy to create software. Software tool manufacturers work hard to promote this.

We cannot, however, ignore the importance of software reliability. As control systems grow in functionality and complexity, we depend on an increasing amount of software. This paper addresses these issues and includes examples of software failures, the “root causes” of those failures, some rules for avoiding those causes and some guidance in evaluating software reliability in control system products.

Download PDF   

The purpose of this document is to report on our successful efforts to validate
statistically certain random equipment failure rate data used in a mechanical
parts failure rate and failure mode database and, by extension, to validate the
techniques used to derive the data. To accomplish this, a Failure Modes,
Effects, and Diagnostic Analysis (FMEDA) is initially used to predict the usefullife
failure rate for the fail-to-open condition of a particular pressure relief valve
(PRV) using the failure rates from the mechanical parts database. Next, this
prediction is statistically tested against three independent data sets consisting of
proof test data for PRV provided by Fortune 500 operating companies. The data
sets all meet the intent of the quality assurance of proof test data as documented
by the Center for Chemical Process Safety (CCPS) Process Equipment
Reliability Database (PERD) initiative. By applying the quantal response method
to the results of these PRV proof tests, it is demonstrated that the proof test data
are consistent with the predictions of the FMEDA. Specifically, all of the data
sets support the FMEDA result at a 95% confidence level. All analyses lead to a
useful-life PRV failure rate between 10-8 and 10-7 failures/hour.

It is very important to note that the results of this study cannot be used to
justify extension of proof test intervals beyond the useful life of the PRV. The
small value of the failure rate derived from the FMEDA applies only to the useful
life of the PRV which depends not only on the equipment’s specifications but also
on other factors, such as the ambient and process environment in which the PRV
is used and the levels and frequency of any on-line maintenance performed.
Data analyses place useful life in the range of 4 to 5 years.

Finally, we note that the results of the statistical analyses of the three
independent data sets predict an initial failure probability of approximately 1% –
1.6%. This initial failure probability is extremely significant as it accounts for the
vast majority of failures observed in proof test. This emphasizes the value of
careful installation and thorough commissioning procedures. When
commissioning testing cannot be done after installation, as is the case with a
PRV, both the initial probability of failing to open, as well as the PFD based upon
the random failure rate must be taken into account in the risk analysis.

Download PDF   

Dr Peter Clarke explains how process plants can benefit through proper and careful adoption of the IEC 61511 safety standard.

Download PDF   

The past few years have brought significant changes to the control safety field in both technology (i.e., fieldbus) and regulation (i.e., IEC 61508). Globalisation has further compounded the increased challenges of keeping pace and the consequences of falling short. Experience in this environment has shown the value of recently developed Web-accessible database and analysis tools to assess and verify the Safety Integrity Level (SIL) of existing and proposed systems. Similarly, improved communications technology has allowed process industry firms to tap into a broader range of specialized expertise for both identifying and addressing critical risk management issues.

Download PDF   

In order to assign a SIL to equipment in low demand applications, we must be able to compute PFDavg. To compute PFDavg, we must first have a model for λD(t), the failure rate of the equipment in the dangerous failure mode. A dangerous failure occurs when equipment designed for prevention or mitigation of an unsafe condition cannot properly respond to the unsafe condition, i.e., the equipment fails on demand. For example, consider a PRV, which, in normal operation, is closed. Should it fail in the “stuck-shut” mode, it would be in a state of dangerous failure as it would be unable to respond to an overpressure event if one occurred.

Download PDF   

The functional safety standards, IEC 61508, IEC 61511, and ANSI/ISA 84.01 each specify the Safety Integrity Level performance parameter for Safety Instrumented Functions. For a Safety Instrumented Function to meet a specific Safety Integrity Level the sum of the average Probability of Failure on Demand (PFDavg) of all components, part of that Safety Instrumented Function, needs to fall in the PFDavg bandwidth related to that Safety Integrity Level.

Download PDF