Failure Coding at Merck: A Model for Maintenance Success

May 12, 2010
Ensuring effective asset management requires a clear and systematic equipment failure coding system.

Editor's Note: A Q&A with the author follows this article.

In 2007, Merck began implementation of SAP across the company, both domestically and internationally. This decision was viewed by the company's dedicated plant maintenance module users as an opportunity to align the Merck maintenance strategy with a more proactive maintenance model. One of the major areas identified for improvement was failure coding.

Failure codes for the previous maintenance module had been developed by the company's IT department. Consequently, the codes were often too basic and hindered meaningful data analysis. Yet a key aspect of the proactive model being introduced was a heightened emphasis on data analysis. Prior experience demonstrated that without proper failure coding, it would be hard to optimize system data for asset performance management.

A team comprised of representatives from the SAP Plant Maintenance module, Reliability, and Maintenance identified goals, as well as the potential advantages and challenges to overcome of a new failure coding model.

Advantages included:

  • Gaining a full understanding of the importance of hierarchies that incorporated elements of ISO 14224, which governs the collection and exchange of reliability and maintenance data
  • Creating a well defined list of equipment types (equipment technical object types in SAP)
  • Establishing well defined rules for system boundaries
  • Gaining a partial understanding of what was not known

Challenges included a lack of SAP knowledge. The team members had only been on board a few months, and although a test database was available, the configuration was subject to change and had limited relevance as a training tool. Another challenge was how to create a product that would provide data useful to the engineer for analysis, yet streamlined enough to encourage adoption by field personnel.

No sooner had the project begun than the timeline was suddenly moved up, meaning the new failure coding system would need to be ready within a matter of a few months. In order to expedite the process, we enlisted the support of asset performance management firm Meridium, who helped to manage the project through to its completion. Meridium’s first suggestion was a workshop that outlined the project’s immediate next steps: to ascertain the status of current failure coding efforts, identify gaps, and construct a plan forward for the development of a failure coding taxonomy supporting Merck's overall asset management strategy.

Moving Forward: The Object Lists

Although Merck was in the early stages of an initiative to develop a reliability culture, general equipment reliability goals and concepts were fairly well defined. The hierarchal structure of the data was already aligned with recognized best practices; however, it was determined that there were gaps and redundancies in the equipment types that had previously been selected.

The gap analysis effort determined that the list for the maintainable items (also known as object parts) was greatly inadequate, and that the proposed damage codes were too confusing and complex. Additionally, since the project team was not very familiar with the inner workings of SAP, some initial concepts were found to be impractical to implement. In order to leverage what SAP knowledge the team did have, a simple diagram of how failure coding flowed within SAP was designed, and information was developed regarding  activity coding and how to enable better reporting and analysis capabilities when using SAP.

Once gaps were identified, basic coding development began. The first step was the identification of object parts. It was decided that separate lists of the parts for mechanical, instrument, and electrical would be created, and then specific equipment types would undergo further granularity. Equipment types such as pumps that were felt to have a large enough population, or those such as robots that were extremely complex, would get object part lists of their own.

One of the challenges in developing failure codes is meeting the needs and expectations of a variety of groups. One of the primary concerns was that if the failure coding became too complex, it would discourage use by field personnel; at the same time, if the codes were too vague, as in the past, they would not support the level of data analysis desired. Similarly, there was an overall consensus to minimize the use of "Other" on all of the lists, in order to streamline data analysis efforts; however, failure codes’ usefulness to field personnel was also recognized. Since SAP supported text comments for any entry, almost all the lists had "Other (Explain in detail)" included.

The first object list was created by brainstorming the various parts of any electrical equipment. The resulting list was very long, and the next stage was to consolidate similar or redundant items. With the goal that most lists would easily display on a single computer screen when a pull-down menu was used, the list was quickly consolidated down to approximately 25 entries.

This process was repeated for instrumentation. After a short time, it became obvious that there were many overlaps in the electrical and instrumentation lists, and the decision was made to consolidate them into one list. With some discussion and effort, this list was eventually reduced to 31 items.

The final generic object list was for mechanical equipment (see Table 1) and followed the same procedure as the previous ones. It was at this time that the group decided that the equipment types requiring their own specific parts list would be done by individuals, and then brought back to the team and vetted. This leveraged the specific knowledge of the various team members without bogging down the overall development process. The mechanical list ended up 42 items long. While longer than desired, the team felt it would be hard to narrow it down much more and still maintain the ability to do acceptable levels of analysis.

Damage and Activities Lists

Next, as a complement to the failure coding exercise, lists were created to help track and analyze damaged equipment. This damage list was developed in much the same way as the object list. For example, electrical and instrumentation were combined. After intense discussion, and a scrapping of the initial list drafted, the group settled on a damage list that was 12 items long for electrical and instruments, and 24 for mechanical equipment.

The final list to be developed was the activities list. With the experience gained from the development of the object and damage lists, it took only a short time for the team to come up with 16 generic activities that covered most maintenance functions.

Finishing Touches

With the object, damage, and activities lists complete, individual team members worked over the next few days to come up with object parts for equipment that was deemed common or important enough to have dedicated lists. These included equipment types such as hoists/cranes, pumps, compressors, turbines, valves, boilers, motors, and several types of ventilation and refrigeration equipment. The final lists for these were resolved via email and teleconferences.

Finishing touches were then put on the entire failure coding project. Since Merck is a multinational company, the lists had to be translated into the primary languages spoken at the first sites going live in SAP. Also, it was determined that some equipment types would not be tracked for failure analysis, because, for example, the equipment had limited failure modes or its maintenance was outsourced. Finally, a set of codes were put in place to capture when a PPM (predictive/preventive maintenance) activity could not be performed.

Once the first group of sites actively began using SAP, we turned our attention to lessons learned. Since the failure coding system was designed to support a more proactive maintenance model, predictive maintenance (PdM) techniques were eventually going to be very important. It was soon apparent that a few minor additions to the lists were needed.

Another challenge was to spell out in greater detail  what the various failure code selections meant, so that field personnel could be trained to use them correctly. The difference between “erosion” and “corrosion”, or “dirty” vs. “fouling”, had to be described, as did what was considered “normal wear and tear”. The education in this area continues.

The last and most important aspect of a successful fault coding effort is getting the end users—the field personnel doing the work—to enter the failure codes consistently and correctly. In reality, this continues to be the greatest challenge—managing the cultural changes required to make the tools effective.

Secrets of Success: An Interview with Ray Kastle

PhM: In a nutshell, what is Merck’s overall maintenance strategy right now, and what role does failure coding play?

R.K.: The use of failure coding is one aspect of the overall reliability excellence goal Merck is establishing. That goal is still evolving as we come to better understand the various aspects that it encompasses. The consistent use of the failure codes allows a site and a company the ability to effectively identify trends and problems according to the manufacturer, maintenance practices, and equipment types.

PhM: In your opinion, is failure coding underestimated or overlooked as a critical means of gaining an analytical understanding of equipment performance and maintenance needs?

R.K.: I would have to say for the most part, yes. My civilian work started in the nuclear power industry, and it was not until we were faced with demonstrating equipment condition and tracking of it for license extension that we started using failure coding there. Within a couple of years, we had detected certain trends that previously had been accepted as “just the way things were.”

Similarly, I believe that Merck will start to find trends not only with equipment, but with work practices that cause the failures. Prior to the use of fault coding, determining any sort of trend was a labor-intensive process of searching work orders and/or making several educated guesses.

PhM: For your previous maintenance model, codes had been developed by the company’s IT department. What was inherently flawed with this approach?

R.K.: We had two different models for the previous implementation, based on geographic locations, and hence, two different approaches. One was so basic that it provided no information (repaired, replaced, adjusted, calibrated, left as is), and the other listed just about every possible failure, with no logic. The second one was not used, since it was just too time consuming to figure out what should be the proper code from a single list of around 100 codes.

The basic flaw in either approach was that personnel versed in maintenance were not involved, and so there was not any buy-in to the codes. Additionally, since maintenance was not involved, there was not any training on the use of the codes except a high-level introduction.

PhM: Which parties should be involved in developing codes, and who should they be developing them for (i.e., both engineers and field personnel)?

R.K.: Fault coding development really requires an integrated approach with people who have done (or are doing) the field work, along with the engineers who want to get the data they need to understand what is occurring in the field, especially if there is a desire to do some sort of wide ranging analysis. IT and other groups have to be there to allow an understanding of what is practical, but ultimately, the codes have to allow the field personnel a relatively painless way to provide the information that the engineers need. Unless that occurs, the end product will not be used.

PhM: A dilemma you encountered was making codes simple enough to be understood by field personnel, but complex enough to provide for detailed data analysis. How do you balance these seemingly contradictory needs?

R.K.: We were fortunate in that a couple of us had a background coming from the field. This gave us the ability to say, "What does that mean?" when the codes were being developed. We also bounced them off some of our people at a couple of the sites to make sure that they were easily understood. From the engineering side, it started as a series of lists. We had to work them down to a manageable size both in terms of usability and in detail. Eventually, it came down to asking ourselves what level of detail we needed that made sense for most of our sites. An essential element was that the lists were not going to be cast in stone. As we learn and develop, we plan on revising them based on the results from their usage.

PhM: What’s the role of a damage list, and what are some best practices in developing damage codes?

R.K.: The damage list is one part of the fault coding. We found that most types of damage were covered under general mechanical or general instrument/electrical type faults. Since there is a wide variety of equipment, we focused on the common denominators associated with those two areas of concern. Recognizing that the field operator was not going to be doing a in-depth root cause analysis, we wanted to at least have the basic types of damage captured as a way to categorize the information. If an in-depth analysis would be needed later on, these at least provide a starting point.

PhM: Are you realizing the fruits of your labors in terms of the data analysis and preventive capabilities you had sought?

R.K.: We currently only have a few sites live in the system, but even with that we have had some good success. Two of our sites that are in relatively close proximity to each other managed to identify a common issue that was leading to repeated mechanical seal failures, and are currently in the process of implementing corrective actions. We are also evaluating upgrading the fault coding in our legacy CMMS system, since it will be several years until all the sites are in SAP.

About the Author
Ray Kastle, CMRP, is a reliability and business process engineer for Merck & Co., Inc.  He began his career in the Navy, working in the engine room on nuclear submarines. He has nearly 30 years experience in maintenance, reliability, and projects in the military, commercial nuclear power, housing rehab, and the pharmaceutical industry. Ray has several years of experience in the SAP PM module and has been heavily involved in its implementation and subsequent business process for the last 3 years.

About the Author

Ray Kastle | CMRP