Risk-based asset management (RBAM) is a method of implementing an asset-management strategy based on the asset-related risks to the value stream. Together with current Good Manufacturing Practices (GMPs), RBAM ensures that all risks are identified and evaluated based on their impact to the value stream.
GMPs cover all aspects of the manufacturing process including validated steps used in creating product, facilities, transportation and storage of product along with the required training and quality programs documented in standard operating procedures. Finally, to close the “plan, do, check, act” loop, GMPs specify the requirement for traceability, record keeping and the ability to recall and investigate deficiencies and complaints.
Both RBAM and GMPs are needed to ensure that all risks are identified and evaluated based on their impact to the value stream. RBAM is a logical way to visualize the assets’ contribution to the process flow, create the proper taxonomy, prioritize assets, evaluate risk, develop risk controls and then measure the effectiveness of these controls (Figure 1).
The first step in implementing the Risk-based Asset Management model is to develop the process flow diagram for manufacturing the product. An example of a process flow diagram for manufacturing blood vaccines can be found in Chapter 45 of the Food and Drug Administration’s Compliance Program Guidance Manual, “Biological Drug Products.” This allows us to visualize the manufacturing process such as with the process flow diagram in Figure 2.
Once the process flow diagram has been developed for various products, a value stream map can be developed by adding the number of operators, material flow, information flow and general icons. An example of the material flow information would be the data box under each process that contains information about manufacturing the product, providing process parameters such as flow time and percent yield. Flow time would be a combination of manual time, auto time and changeover (Figure 3).
Once the process flow diagram is complete, we can model the process by assigning the equipment that is utilized in the process and the relationship with distributive systems such as electrical power and steam supply. Reliability block diagrams also allow the process or system to be modeled to identify single point failures and redundancy. Once the modeling is complete, asset types must be defined to ensure assets with like attributes are known in order to streamline analysis. A corporate policy on naming and description conventions should be developed so that an autoclave in one process or facility is specified the same way and at the same level of the hierarchy as the next.
Hierarchy development is the final component and one that is rarely done correctly. Most organizations’ hierarchical structures were developed when their financial accounting software was implemented. These structures are more aligned to general ledger and balance sheets linking to cost centers, not to the lowest level of maintainable component, which is considered best practice from an asset management perspective.
Figure 4 is an excerpt from ISO 14224:2006 (“Petroleum, Petrochemical and Natural Gas Industries – Collection and Exchange of Reliability and Maintenance Data for Equipment.”) Even though this international standard is not specifically designed for the pharmaceutical industry, a significant amount of this information is relevant.
Most organizations make the mistake of developing their hierarchy to level 4 or 5 instead of level 7, so work orders are written to the system or process level, and material is also mapped to that location. This makes identifying repair parts or evaluating bad actors almost impossible. Instrumentation and control is usually another weakness in the way the hierarchical structure is developed. If there are door switches and pressure instruments that require calibration for an autoclave, do they show up as children to the autoclave or is their parent the room in which they are located? The latter approach is typical, due to the way the calibration program is set up, but adds significant risk when aligning critical instrumentation to the equipment which it serves.
Performing a criticality analysis on the equipment seems like a daunting task, but without it, how can prioritization occur? A good understanding of the value that the products create, and which processes are required to manufacture them, is an important first step. From there, you can evaluate the equipment on impact to the value stream by looking at how failures to that equipment will impact environment, health, safety, reputation and production. Then you develop a scale from 1 to 10, rate the equipment against these variables, and then calculate the overall criticality of the equipment. Figure 5 is an example of what the criticality analysis would look like.
When done correctly, equipment criticality should separate the equipment into five to 10 groups. These groups should equate to a numerical value with the largest number representing the most critical equipment. This would also be the number used in the information management system such as SAP, INFOR or Maximo. The field could be called ABC Indicator, Priority, etc. This allows the system to show the importance of the asset, which will help prioritize the work order backlog as well as what to focus on from a continuous improvement perspective.
Once the equipment criticality is complete, a failure mode and effects analysis (FMEA) can be conducted on the most critical equipment – the equipment that poses the greatest risk or creates the most value in the value stream. IEC 60812-2008, (“Analysis Techniques for System Reliability - Procedure for FMEA”) is the international standard, and a good resource for conducting this level of analysis. In order to define the boundaries of this analysis, a functional block diagram must first be developed.
The primary purpose of the functional block diagram (Figure 6 is for an autoclave) is to ensure that all of the functions provided by and within the asset are documented so that functional failures can be determined and analyzed.
The next step is to assign the maintainable components in each functional block, since the analysis must be done at that level. Once this is completed and functions of the components have been determined, we must identify functional failures, failure modes, effects, causes, and then the control strategy currently in place to mitigate or eliminate the failure. For example, the chamber of an autoclave is a vessel whose function is to maintain mechanical integrity. A functional failure occurs when this mechanical integrity is violated. Common failure modes that could occur include brittle fracture, stress corrosion cracking, fatigue, welding problems, erosion, creep, stress rupture and hydrogen embrittlement. Causes of these failures could include design errors, fabrication errors, corrosion, and improper operation and maintenance. The effects of any of these could be the loss of sterility, the loss of equipment, or even loss of life.
Examples of some current control strategies that may be in place are safety devices, tags linked to process intelligence, or weekly operational checks. With all of the information collected about failures, causes and control strategies, a risk priority number (RPN) can be developed to aid in the risk analysis and risk ranking to evaluate if the current control strategy is sufficient. The risk priority number is the product of three non-dimensional numbers: severity, occurrence and detection. Table 1 provides an example of a severity table. The higher the number, the higher the risk, so on a scale of 1 to 10, the most severe, the greatest chance of occurrence, and least likely to detect would all result in a 10, with the product of these values, 1000, being the RPN.
There are several ways to perform the analysis once numbers have been identified for each of the causes to the failure mode. One method is to establish a threshold, which, when exceeded, requires an evaluation of alternate control strategies to lower the detection variable. The other two variables, severity and occurrence, will not be affected initially when new controls are put in place.
Another method is to evaluate the severity number and the RPN concurrently. This method is preferred because it does not evaluate RPN alone. Consider a threshold of 250, which will be exceeded with a severity of 7, an occurrence of 6 and detection of 6. The RPN of 252 may be valid to evaluate the current detection method to lower the RPN below 250. However, take the example of an 8 for severity, a 5 for occurrence and a 6 for detection. The resulting RPN is 240, but the evaluation of the numbers independently may lead you to evaluate your current control strategy. This example has a 50-50 chance of occurring with less than that of detection for an effect that is significantly severe. For this reason, setting the threshold at 250 for RPN and 7 for severity may be more appropriate.
Table 2 is an example of a completed risk assessment with recommended improvements.
The next phase of risk-based asset management is control. Two important functions of this phase are to determine the information to be collected during breakdowns and to develop the task list of control strategies to address the risks. Equipment history is an important source of information relating to the severity and probability of failure. In order for this information to assist in making asset-management decisions, it must contain information relative to reliability and financial analysis. If effect codes and cause codes are not used on corrective work orders that are based on asset type, then pertinent data will be missing from the analysis.
Consider the autoclave; it is made up of maintainable components that, when they fail, have an effect on the process and also have a cause. If this information is collected, along with the duration of autoclave outages and the monetized impact on production, the risk to the value stream can be calculated in dollars. This activity is crucial for the measurement phase to help identify where the losses are occurring in the value stream.
In the prior example, compromised mechanical integrity was evaluated by a weekly operational check. Based on the review of the data, it is determined that a different control strategy was needed to identify pending failure of this predominant failure mode. Ultrasonic testing was selected and implemented to lower the detection number in the RPN, thus reducing risk. This task is then added to the controls in place for the autoclave.
In order for preventive maintenance tasks to be effective, they must add value, be failure-mode based, repeatable, accomplished at the right frequency, and reflect an accurate duration to accomplish. Common mistakes made in developing these tasks include a lack of acceptance criteria, no feedback mechanism, no skill or trade level required, and referencing other documents that are not available.
The final phase of the model is fundamental to ensuring continuous improvement in risk-based asset management. Metrics like mean time between failure and mean time to repair provide information regarding the maintainability and reliability of the assets. From a process standpoint, some better metrics are overall equipment effectiveness and asset utilization.
Asset Utilization (AU) is defined as a product of availability, rate and quality. Overall Equipment Effectiveness (OEE) is defined as the product of uptime, rate and quality. Both calculations for RATE and QUALITY are the same. Rate is the average rate over the best demonstrated rate or designed rate. Quality is the first pass yield over the total yield. In the example used in the value stream map, 10,000 5cc vials in 3.25 hours would give a rate of 100% at 3,077 vials an hour. If we ran at that rate and all 10,000 vials met customer satisfaction, then we would also have 100% quality.
Figure 7 gives an example of AU and OEE and the contributing loss components.
The difference between the two is a function of schedule versus capacity. If production schedules three eight-hour shifts, seven days a week, then the schedule is utilizing all of the capacity. On a one eight-hour shift schedule, five days a week, the uptime is based on 40 hours of capacity. If the equipment was up all 40 hours, then uptime is 100 percent. This would also yield an availability of 33%. The result given the above quality and rate would yield an OEE of 100% and an AU of 24%.
Using both will allow you to determine where the losses are coming from and what part of our capacity we are utilizing to create value. If we have a product that creates a demand that exceeds our market plan, and the patent expiration is a few years off, leveraging more of the available capacity may be the right decision.
The four phases of the RBAM model are inter-dependent and create a plan – do – check – act continuous improvement model. Without investing the time to develop risk-based asset management, a significant part of the value stream may be at risk. In an industry under significant regulatory scrutiny, intense competition and legal exposure, it is not a surprise that the trend in pharmaceutical manufacturing is moving toward risk-based asset management.
Why Implement a Risk-based Asset Management Strategy?
Fundamentally, in a risk-based asset management system, you collect relevant information based on importance to the value stream and use this information to make fiscally responsible decisions that will in turn create greater value to the organization. The four phases in the risk-based asset management model are critical for the success of this strategy. When you couple this strategy with business processes that support best practice, seamlessly integrated to leverage critical information to make decisions, and supported by a corporate culture driven to the relentless pursuit of continuous improvement, you can achieve results like these:
- Personnel have recognized the value of continuous improvement and have demonstrated their belief with their actions.
- Limiting factors have been identified and reduced by orders of magnitudes.
- Substantial capital investments have been avoided by improving capacity and availability
- Significant reduction in cost of products sold
These benefits result in significantly improved operational stability along with substantial financial improvement.
About the Author
Mike Poland, CMRP, is the Director of Life Cycle Engineering’s Asset Management Services group. With more than 25 years of engineering and maintenance experience, Mike specializes in reliability processes and systems engineering with an emphasis on defect detection and elimination through root cause analysis and risk based inspections. He can be reached at [email protected].