Data integrity standards in automation

Robust data integrity ensures you can trace, verify, and act before a minor issue becomes a major recall.

June 25, 2025

10 min read

Your product finally made it to market, but after a few months, it’s determined that a recall is needed. Do you know which product line the material came from or what dates it was manufactured? How much product needs to be recalled? From a financial standpoint, you don’t want to recall more than is required.

Ultimately, data integrity is good record-keeping for your entire product life cycle.

From raw material input, through processing, through shipment of the final product, you need to know, with certainty, the who, what, where, and when.

Key tenets

Nine key tenets for data integrity in the pharmaceutical industry were established by the International Society of Pharmaceutical Engineers (ISPE).

These key tenets state that data must be attributable, legible, contemporaneous, original, accurate, complete, consistent, enduring, and available.

While it may seem daunting, many of these tenets will naturally emerge from your process if you have a properly automated system, including your process control system, your data historian, and often, a manufacturing execution system (MES) to consolidate data outside of the control system environment.

Properly automated systems

Properly automated systems consist of more than just a PLC/DCS control system. You must follow current industry best practices for the pharmaceutical industry, such as adhering to S88 batch programming standards, following ISA99 cybersecurity principles, and observing ISO 9001 quality standards practices for operations.

Since a well-automated system depends on the interaction of human operators and production personnel, they need to be able to properly interpret and command the control system to carry out tasks. If there are problems, operations personnel need to be trained to resolve them. The idea of a well-automated system is that it improves the repeatability of your processes, eliminates personnel exposure to hazardous activities, and reduces menial tasks, but it doesn’t replace your operations personnel.

Redundancy and availability

Your plant’s operation is done according to a predetermined, validated script. Your control system records its inputs and outputs in real-time. All that data is historized and stored securely in a process historian, either from your control system vendor or a third-party provider. If your data is not readily available, traceability is compromised at any given time.

As part of good automation practices, you should consider redundancy and availability both in initial system design and as an ongoing operational focus — in communication, control, data collection, and data storage.

How much online redundancy do you require? How long can data be unavailable, but a process be allowed to continue to run? Can data be buffered, or will the data be missing after normal operation is resumed?

When designing an automation system, each link in the data creation and storage chain must be considered. You must also consider the operational aspects, and how you will handle various failure scenarios. The answers to these questions will depend on your process, company policies, customer requirements, and regulatory requirements. These needs may also change over time, so periodic re-evaluation should be anticipated.

Sources of data

The data you care about when producing pharmaceutical products is more than the data generated from your control system. Offline laboratory testing data used to confirm product quality is most typically entered into a separate system from the control system, often called a LIMS (Laboratory Information Management System), and needs to be merged to connect a certain batch, a particular product run, or a specific time slice of information. Data from your production processes, control system, and laboratory system need to be linked to a common dashboard or platform.

In addition, your maintenance records, kept in a separate CMMS (Computerized Maintenance Management System), have relevant information. Knowing when things broke, when they were serviced, and what equipment changes were made to a process, can be vital contextual information to explain and timestamp process deviations.

With the use of an MES solution, you can access data from multiple systems including your control system, which is typically a secure environment with only limited access by authorized personnel. The MES can pull important data out of your lab system, your maintenance system, and even your enterprise resource planning (ERP) system and into a common dashboard to view and correlate all your data. Your MES is also often a platform that your production personnel in the field will use to enter data from their operator rounds or manual processes.

Challenges ahead

Cybersecurity — We must treat cybersecurity more seriously than in the past due to today’s substantially more challenging threat environment.

Ransomware attacks have already hit many manufacturers, and those who have not been hit yet can expect that an attack will be attempted at some point. Also, while ransomware attacks may be the most prevalent cybersecurity attack, they are not the only cybersecurity risk. Data theft, or cyber espionage, meaning the theft of proprietary information, is also a common risk. If cyber threats can access your data, they may also have the ability to alter your data, which is another threat to your data integrity.

Are you keeping good backups? The first step is to do everything you can to defend against and mitigate threats. But, as a last resort, if the worst-case scenario happens, you should ensure you have established a good recovery plan. This is necessary for malicious threats, accidental mishaps, and system failures.

How do you get data back if you have a catastrophic failure? How long will you be down? What’s your restoration procedure? All of these considerations should determine how you meet the outlined tenets and should be established ahead of time. The time to devise or test a restoration procedure is not after the failure.

A good recovery plan is based on the premise that you never keep only one copy of your data. Unforeseen things happen. The best practice is to keep a copy of that data in another physical location, such as your corporate cloud inside your IT infrastructure, a third-party cloud, or even an offline backup.

The backup methods you choose will be based on the importance of your data and how quickly you will need to recover it. Other considerations include the frequency of your backups and the outcome of your last test restoration.

Traceability — You want to ensure that you have good data and know where it comes from. What instruments generated that data? What people generated that data? Who had access to that data after it was received? You need to be able to trace any sources that could alter that data throughout its journey to its final destination.

Years ago, operator commands and inputs were printed in real-time on formfeed printer paper. It was not uncommon when an incident occurred that there was an “issue” with the printer – out of paper or a paper jam. Thankfully, a backup electronic log usually mirrored the output to the printer, so you could determine if a setpoint was entered incorrectly or a restart command given, etc. Still, you could often not narrow it down to a particular person, only a console, because operators typically shared the same account. Over the years, traceability has improved with individual logins and authorizations. It’s still common practice in many industries to have shared accounts for your control room personnel.

People make mistakes, so when that happens, you need the ability to determine which operator entered the wrong setpoint or command. This is not for disciplinary purposes, but to investigate the cause and improve training procedures. Operators hold much responsibility, so they should be trained well and given the opportunity to learn from mistakes.

Traceability also includes the instrumentation and equipment generating process and laboratory testing data. Was an equipment change made recently corresponding to a change in product characteristics? Which temperature transmitter was used to determine when the reaction was complete? Were the VFD settings on the agitator motor changed?

How are you tracking your raw material inputs? Do you get deliveries via rail car or 10 lb sacks? Are your raw materials kept segregated or combined in a large holding tank or feed hopper? What are your raw material testing and sampling procedures? How long do you keep raw material samples for additional testing? When product quality issues occur and nothing has changed in the process, unknown changes in your input materials may be to blame. Can you pinpoint when specific lots of raw materials entered the process?

If you follow proper protocols related to traceability, these questions should be verifiable. While some answers may be in your automated control system, others will be found in your MES or other connected systems.

Accuracy — It’s often taken for granted that, looking at a reading in the process historian, the value you see is the actual value. However, that data reading comes from physical instrumentation in the real world, and instrumentation is only as good as its design and maintenance. You can have an excellent instrument for that process, installation, or location, but if you aren’t maintaining it well, you can’t trust that instrument.

At the same time, you can maintain the instrument well, but if it’s the wrong instrument or mounted in the wrong location, it can give you bad readings. Readings may be consistently off – low or high – so that it’s skewing your data, or readings may be fluctuating or erratic. If a temperature probe is mounted in the wrong location, it may read higher or lower than the reaction mass inside. If a flow meter is installed in the wrong location in a pipe, air bubbles can rise through it, throwing the reading off.

Data is not always trustworthy. Are there other complementary or redundant signals to help verify the readings? Are redundant temperature transmitters giving similar readings? It’s common practice in safety systems to program voting arrangements to determine how a decision will be made when multiple sensors don’t agree. From a quality standpoint, if a process sensor reading is critically important, you want some redundancy in your readings. If you can’t trust the readings coming off of your instrument, then you’re putting inaccurate or unreliable data into your process control system and into your historian.

AI’s impact on data integrity

AI depends on the key tenets of data integrity as a foundation. If you have accurate, complete, and consistent data, you can implement AI to help you make better decisions, run your systems more efficiently, give predictions for reliability events, or give subsequent predictions about your process. If you feed faulty data, AI will either be too confused to give you a prediction, or it will give you a false prediction because it assumes the faulty data is accurate. So, the more you adopt these tenets and meet data integrity standards, the easier it will be to adopt AI.

Conversely, AI can impact data integrity. Since AI-generated responses can be hard to trace back or replicate, the data AI generates is often at odds with good data integrity principles. This means that even if AI feedback and recommendations are useful to your process or operations, more traditional data should be maintained to document the results.

So, it’s fine if AI recommends that it is time to service a pump based on a mysterious combination of process readings. However, it is somewhat more complicated to let AI call a batch reaction complete based on a multi-variate assessment with a statistical probability score. In the end, it may be acceptable if traditional laboratory analyses of the finished product are sufficient to determine that the product meets all customer specifications.

Good data integrity depends on many things, but it all centers on a properly automated system, where most of the data is generated and stored. That automated system consists of not just the PLC/DCS control system, but the process historian, the MES, and multiple other platforms that feed into it.

It also depends on the people who interact with that system, their training, and their procedures. It can be challenging to bring all those components together effectively, but good automation practices can help you achieve your goal.

About the Author

Heath Stephens, PE

Digitalization Leader, Hargrove Controls & Automation (CSIA)

Heath is responsible for various technical lead and management duties. His project execution experience includes predictive failure analysis, RAM modeling implementation, site and enterprise Aspen IP.21 and AVEVA PI process historians, design and installation of dedicated OT networks, GMP regulated DeltaV batch, SIL calculations, logic solver installation, as well as PLC control system design. Hargrove is a certified member of the Control System Integrators Association (CSIA).

Integration of Lonza’s Vacaville biologics site, acquired from Roche in 2024, is progressing

Data integrity standards in automation

Key tenets

Properly automated systems

Redundancy and availability

Sources of data

Challenges ahead

AI’s impact on data integrity

About the Author

Heath Stephens, PE

Digitalization Leader, Hargrove Controls & Automation (CSIA)

Related

Integration of Lonza’s Vacaville biologics site, acquired from Roche in 2024, is progressing

10 companies hit by FDA’s 2025 inspection crackdown

Arch Resources Saves with OnTrak System & Condition-Based Lubrication

The Future of Ultrasound with UE Systems

Trending

Plant-based biologics: From setbacks to reinvention

Simtra BioPharma Solutions expands sterile manufacturing footprint in US and Europe

Editor’s (re)View: Trump’s dangling of tariff relief brings Big Pharma to negotiating table

Sponsored Picks

AI Compliance in Life Sciences: 5 Essentials

AI-Powered QMS: The Future of Quality Management

Medical Professionals' Confidence in Products: New Research