PAT Success and Speed Hinge on Data Mining

June 1, 2005
Not only data mining but metadata must be considered closely, by a cross-functional team representing all users of that data.
When implementing process optimization procedures, it is necessary to look back and review techniques and methods you’ve tried in the past in order to analyze which processes and/or technologies will give you the optimum results. In order to avoid “reinventing the wheel” where process analytical technologies (PAT) is concerned, your organization must have a data warehouse in place and an efficient method for data mining.On the manufacturing side, the shift toward PAT is gaining momentum. This effort will allow companies to monitor their manufacturing processes continuously and automatically in real time, rather than intermittently and historically via samples and post-manufacturing quality controls.The strategies that you create for data warehousing and mining will determine how quickly, you can make PAT data available, proactively, to scientists, engineers and management.From this perspective, an integrated team of professionals from different fields, including computer scientists, chemists, manufacturing staff, computer engineers, biologists, pharmacists, should jointly develop the “best practices” required for mining data. They should also develop a strategy for establishing and maintaining the system’s CFR Part 11 compliance to facilitate PAT success.Data mining has become very important in connection with the need to extract useful information from extremely large sets of data points and associated metadata, the data that describe data. Independent of any one database, data mining finds patterns within the data, leverages analytical and computing technologies and other methodologies to find relationships that were previously unknown.In many areas of process measurement, one can acquire very large quantities of measurement data, simultaneously, that are associated with many variables as a function of both of space and time. Extreme care should be taken when choosing data management and storage systems, including databases and data warehouses. For process measurement data, one should consider the data to be analysed using chemometric and/or statistical software as well as data mining software.It is very important to understand the difference of “On-line Analytical Process” (OLAP; a method commonly confused with proactive analysis) versus Data Mining: OLAP provides you with a very good view of what is happening, but cannot predict what will happen in the future or why it is happening. Data mining techniques and software tools can support a successful implementation of PAT.The Cross Industry Standard Process for Data Mining (CRISP-DM) was launched on September, 1996 to create standard processes on DM. Funded by the European Commission and with over 200 members word wide, it is a non-proprietary, application/industry neutral, tool neutral, that focus on technical analysis via the establishment of a framework for guidance.Also, the use of the Internet, but more specifically the tools used to mine and analyze data from the global network, is leading to a new generation of data and text mining tools that will enable pharmaceutical companies to quickly and efficiently draw meaning from huge quantities of research data. Web-mining will help the industry conduct research, select potential targets for further study, identify trends, perform more active pharmacovigilance, anticipate potential crises and gain better patient insights — accomplishing all of these while integrating the quality management chain.Metadata will also be important. For example, the time and date when a measurement was made, what variable was measured, in what units the variable was measured, from where and from what the substance was sampled, under what conditions it was taken, time delays and time lags in the sampling system, etc.Reliable recording of metadata associated with measurements made using process instrumentation, whether for calibration or process monitoring, is essential. It is also crucially important that the links between measurement data and associated metadata are preserved intact through all data collection, data management and data analysis processes. Otherwise, interpretation of such measurement has little or no value in process monitoring, control or feedback adjustments.The use of chemometric methods for the design of data collection protocols, data validation methods, quality management and data analysis and interpretation is an integrated subject that has to be strategically planned for the success of the implementation.In conlusion, pharmaceutical companies devote most of their IT resources to technologies that cut costs such as supply chain, transaction processing and support services, many of which are increasingly becoming outsourced to external providers. New information technologies integrated to PAT will enable the drug industry to fundamentally change the way it does business.
About the Author

Julio Pamias | CEO