Managing Risk: GMPs are not Enough

Risk-based asset management offers a way to prioritize assets and reduce overall risk

Sept. 12, 2012

15 min read

Risk-based asset management (RBAM) is a method of implementing an asset-management strategy based on the asset-related risks to the value stream. Good Manufacturing Practices (GMPs) cover all aspects of the manufacturing process including validated steps used in creating product, facilities, transportation and storage of product along with the required training and quality programs documented in standard operating procedures. Finally, to close the plan, do – check – act loop, GMP specifies the requirement for traceability, record keeping and the ability to recall and investigate deficiencies and complaints. Both RBAM and GMP are needed to ensure that all risks are identified and evaluated based on their impact to the value stream.

RBAM is a logical way to visualize the assets’ contribution to the process flow, create the proper taxonomy, prioritize assets, evaluate risk, develop risk controls and then measure the effectiveness of these controls.

Figure 1 Risk-based Asset Management Implementation Model

Classify
The first step in implementing the Risk-based Asset Management model is to develop the process flow diagram for manufacturing the product. An example of a process flow diagram for manufacturing blood vaccines can be found in Chapter 45 of the Food and Drug Administration’s Compliance Program Guidance Manual, “Biological Drug Products.” This allows us to visualize the manufacturing process such as with the process flow diagram in Figure 2.

Figure 2 Process Flow Diagram

Once the process flow diagram has been developed for various products, a value stream map can be developed by adding the quantity of operators, material flow, information flow and general icons. An example of the material flow information would be the data box under each process that contains information about manufacturing the product, like flow time and percent yield. Flow time would be a combination of manual time, auto time and change over.

Figure 3 From the example in Figure 3, aseptic filling takes 3.25 hours. For this product line, this batch process is designed to fill 10,000 5cc vials in 3.25 hours. This is at a rate of one vial every 1.17 seconds. In a liquid fill, this process is also the constraint. These parameters are important when we get to the final phase of our implementation model, “Measure”, because we need to document the designed or best demonstrated rates of our process to trend performance and use in loss elimination and continuous improvement activities. Transparency in these measures is also important because we can use them in our visual management system.

Once the process flow diagram is complete, we can model the process by assigning the equipment that is utilized in the process and the relationship with distributive systems such as electrical power and steam supply. Reliability block diagrams also allow the process or system to be modeled to identify single point failures and redundancy. Once the modeling is complete, asset types must be defined to ensure assets with like attributes are known in order to streamline analysis. A corporate policy on naming and description conventions should be developed so that an autoclave in one process or facility is specified the same way and at the same level of the hierarchy as the next.

Hierarchy development is the final component and one that is rarely done correctly. Most organizations’ hierarchical structures were developed when their financial accounting software was implemented. These structures are more aligned to general ledger and balance sheets linking to cost centers, not to the lowest level of maintainable component, which is considered best practice from an asset management perspective. Figure 4 below is an excerpt from ISO 14224:2006, Petroleum, petrochemical and natural gas industries – Collection and exchange of reliability and maintenance data for equipment. Even though this international standard is not specifically designed for the pharmaceutical industry, a significant amount of this information is relevant.

Figure 4 Collection and exchange of reliability and maintenance data for equipment

Most organizations make the mistake of developing their hierarchy to level 4 or 5 instead of level 7, so work orders are written to the system or process level and material is also mapped to that location. This makes identifying repair parts or evaluating bad actors almost impossible. Instrumentation and control is usually another weakness in the way the hierarchical structure is developed. If there are door switches and pressure instruments that require calibration for an autoclave, do they show up as children to the autoclave or is their parent the room in which they are located? The latter is typical due to the way the calibration program is set up, but provides significant risk in aligning critical instrumentation to the equipment which it serves.

Analyze
The next phase of risk-based asset management is to analyze the assets and develop a prioritization of how those assets impact the value stream and how corporate resources will be allocated. This is a significant component of the RBAM methodology because identifying risk, and then developing control strategies to mitigate or eliminate it, are the keys to success.

Performing a criticality analysis on the equipment seems like a daunting task, but without it, how can prioritization occur? A good understanding of the value that the products create, and which processes are required to manufacture them, is an important first step. From there, you can evaluate the equipment on impact to the value stream by looking at how failures to that equipment will impact environment, health, safety, reputation and production. Then you develop a scale from 1 to 10, rate the equipment against these variables, and then calculate the overall criticality of the equipment. Figure 5 is an example of what the criticality analysis would look like.

Figure 5 Example of a criticality analysis spreadsheet

When done correctly, equipment criticality should separate the equipment into five to 10 groups. These groups should equate to a numerical value with the largest number representing the most critical equipment. This would also be the number used in the information management system such as SAP, INFOR or Maximo. The field could be called ABC Indicator, Priority, etc. This allows the system to show the importance of the asset, which will help prioritize the work order backlog as well as what to focus on from a continuous improvement perspective.

Once the equipment criticality is complete, a failure mode and effects analysis can be conducted on the most critical equipment – the equipment that poses the greatest risk or creates the most value in the value stream. IEC 60812-2008, Analysis techniques for system reliability - Procedure for failure mode and effects analysis (FMEA) is the international standard and a good resource for conducting this level of analysis. In order to define the boundaries of this analysis, a functional block diagram must first be developed.

The primary purpose of the functional block diagram is to ensure that all of the functions provided by and within the asset are documented so that functional failures can be determined and analyzed. These block diagrams include major components, interfaces to distributive systems, interfaces between subsystems, power, data and structural interfaces. Figure 6 is an example of a block diagram for an autoclave.

Figure 6 Functional block diagram for an autoclave

The next step is to assign the maintainable components in each functional block since the analysis must be done at that level. Once this is completed and functions of the components have been determined, we must identify functional failures, failure modes, effects, causes, and then the control strategy currently in place to mitigate or eliminate the failure. For example, the chamber of an autoclave is a vessel whose function is to maintain mechanical integrity. A functional failure occurs when this mechanical integrity is violated. Common failure modes that could occur include brittle fracture, stress corrosion cracking, fatigue, welding problems, erosion, creep, stress rupture and hydrogen embrittlement. Causes of these failures could include design errors, fabrication errors, corrosion, and improper operation and maintenance. The effects of any of these could be the loss of sterility, the loss of equipment, or even loss of life.

Examples of some current control strategies that may be in place are safety devices, tags linked to process intelligence, or weekly operational checks. With all of the information collected about failures, causes and control strategies, a risk priority number (RPN) can be developed to aid in the risk analysis and risk ranking to evaluate if the current control strategy is sufficient. The risk priority number is the product of three non-dimensional numbers: severity, occurrence and detection. Table 1 is an example of the severity table.

Severity Evaluation Criteria
Effect    Criteria: Severity of Effect    Rank
Catastrophic without warning    Very high severity ranking when a potential failure mode affects safe operation, may cause death or injury and/or involves noncompliance with government regulation without warning. Extended Repair outages.    10
Hazardous - with warning    Very high severity ranking when a potential failure mode affects safe operation, may cause death or injury and/or involves noncompliance with government regulation with warning. Extended repair outages.    9
Very High    Item inoperable, with loss of primary function.    8
High    Item operable, but at reduced level of performance. Customer dissatisfied.    7
Moderate    Item operable, but Comfort/ Convenience item(s) inoperable. Customer experiences discomfort.    6
Low    Item operable, but Comfort/ Convenience item(s) operable at reduced level of performance. Customer experiences some dissatisfaction.    5
Very Low    Marginal system degradation.    4
Minor    Annoying. No System degradation.    3
Very Minor    Hardly any effect. Qualified personnel are able to realize a failure has occurred.    2
None    No noticeable effect. Unable to realize a failure has occurred.    1*

Table 1 Severity table

The higher the number, the higher the risk, so on a scale of 1 to 10, the most severe, the greatest chance of occurrence, and least likely to detect would all result in a 10, with the product of these values, 1000, being the RPN. There are several ways to perform the analysis once these numbers have been identified for each of the causes to the failure mode. One method is to establish a threshold, which, when exceeded, requires an evaluation of alternate control strategies to lower the detection variable. The other two variables, severity and occurrence, will not be affected initially when new controls are put in place. Another method is to evaluate the severity number and the RPN concurrently. This method is preferred because it does not evaluate RPN alone. Consider a threshold of 250, which will be exceeded with a severity of 7, an occurrence of 6 and detection of 6. The RPN of 252 may be valid to evaluate the current detection method to lower the RPN below 250. However, take the example of an 8 for severity, a 5 for occurrence and a 6 for detection. The resulting RPN is 240, but the evaluation of the numbers independently may lead you to evaluate your current control strategy. This example has a 50-50 chance of occurring with less than that of detection for an effect that is significantly severe. For this reason, setting the threshold at 250 for RPN and 7 for severity may be more appropriate. Table 2 is an example of a completed risk assessment with recommended improvements.

Table 2 Risk assessment with recommended improvements

Control
The next phase of risk-based asset management is control. This phase uses the analysis performed in the analyze phase to develop control strategies to mitigate or eliminate risk. Two important functions of this phase are to determine the information to be collected during breakdowns and to develop the task list of control strategies to address the risks. Equipment history is an important source of information relating to the severity and probability of failure. In order for this information to assist in making asset-management decisions, it must contain information relative to reliability and financial analysis. If effect codes and cause codes are not used on corrective work orders that are based on asset type, then pertinent data will be missing from the analysis.

Consider the autoclave; it is made up of maintainable components that, when they fail, have an effect on the process and also have a cause. If this information is collected, along with the duration of autoclave outages and the monetized impact on production, the risk to the value stream can be calculated in dollars. This activity is crucial for the measurement phase to help identify where the losses are occurring in the value stream.

In the prior example, compromised mechanical integrity was being evaluated by a weekly operational check. Based on the review of the data, it is determined that a different control strategy was needed to identify pending failure of this predominant failure mode. Ultrasonic testing was selected and implemented to lower the detection number in the RPN, thus reducing risk. This task is then added to the controls in place for the autoclave.

Some other examples of control strategies are operating procedures to ensure operating parameters are not exceeded critical spares such as door hinges; preventive maintenance to check door gaskets and black light tests; and the predictive maintenance that is now in place. In order for preventive maintenance tasks to be effective, the following critical success factors must be met:
1) the task adds value
2) the task is failure mode based
3) the task is comprehensive
4) the task is well organized
5) the task is repeatable
6) the task is accomplished at the right frequency
7) the task reflects an accurate duration to accomplish.

All of these elements are important to ensure that the activities being planned and scheduled create value and reduce risk.

Common mistakes made in developing these tasks include a lack of acceptance criteria, no feedback mechanism, no skill or trade level required, and referencing other documents that are not available. The key to dynamic program is to ensure the identification of deficiencies and the control strategy used to mitigate or eliminate root cause, not to just have a place to charge time, which creates little or no value. It is also imperative that these controls are subject to audit and surveillance to validate their effectiveness.

Measure
The final phase of the model ─ measure ─ is fundamental to ensuring continuous improvement in risk-based asset management. Metrics like mean time between failure and mean time to repair provide information regarding the maintainability and reliability of the assets. From a process standpoint, some better metrics are overall equipment effectiveness and asset utilization. Asset Utilization (AU) is defined as a product of availability, rate and quality. Overall Equipment Effectiveness (OEE) is defined as the product of uptime, rate and quality. Both calculations for RATE and QUALITY are the same. Rate is the average rate over the best demonstrated rate or designed rate. Quality is the first pass yield over the total yield. In the example used in the value stream map, 10,000 5cc vials in 3.25 hours would give a rate of 100% at 3,077 vials an hour. If we ran at that rate and all 10,000 vials met customer satisfaction, then we would also have 100% quality. Figure 7 gives an example of AU and OEE and the contributing loss components.

Figure 7 AU and OEE and the contributing loss components

The difference between the two is a function of schedule versus capacity. If production schedules three eight-hour shifts, seven days a week, then the schedule is utilizing all of the capacity. On a one eight-hour shift, five days a week, the uptime is based on 40 hours of capacity. If the equipment was up all 40 hours, then uptime is 100 percent. This would also yield an availability of 33%. The result given the above quality and rate would yield an OEE of 100% and an AU of 24%. Using both will allow you to determine where the losses are coming from and what part of our capacity we are utilizing to create value. If we have a product that creates a demand that exceeds our market plan, and the patent expiration is a few years off, leveraging more of the available capacity may be the right decision.

The four phases of the RBAM model are interdependent and create a plan – do – check – act continuous improvement model. Without investing the time to develop risk-based asset management, a significant part of the value stream may be at risk. In an industry under significant regulatory scrutiny, intense competition and legal exposure, it is not a surprise that the trend in pharmaceutical manufacturing is moving toward risk-based asset management.

Why Implement a Risk-based Asset Management Strategy?
Fundamentally, in a risk-based asset management system, you collect relevant information based on importance to the value stream and use this information to make fiscally responsible decisions that will in turn create greater value to the organization. The four phases in the risk-based asset management model are critical for the success of this strategy. When you couple this strategy with business processes that support best practice, seamlessly integrated to leverage critical information to make decisions, and supported by a corporate culture driven to the relentless pursuit of continuous improvement, you can achieve results like these:

Personnel have recognized the value of continuous improvement and have demonstrated their belief with their actions.
Limiting factors have been identified and reduced by orders of magnitudes.
Substantial capital investments have been avoided by improving capacity and availability
Significant reduction in cost of products sold

These benefits result in significantly improved operational stability along with substantial financial improvement.

Mike Poland, CMRP, is the Director of Life Cycle Engineering’s Asset Management Services group. With more than 25 years of engineering and maintenance experience, Mike specializes in reliability processes and systems engineering with an emphasis on defect detection and elimination through root cause analysis and risk based inspections. His approach to risk-based asset management and the elimination of limiting factors for clients provides greatly enhanced asset utilization at a much lower total cost of ownership. Mike can be reached at [email protected].