Multivariate Data Analysis for Biotechnology, Bio-processing

March 11, 2014
Powerful MVA and DoE methods are giving biotech companies greater insights from complex data

The modern biopharmaceutical/biotechnology manufacturing facility contains many sophisticated control, data logging and data archiving systems. Massive amounts of data are collected from sources such as raw materials analysis, process outputs and final quality assessments, which are stored in data warehouses.

The sheer volume of data contained in these warehouses makes it a near impossible task to extract the information using simple charting and univariate methods of analysis. Such complex data requires methods of analysis that can cope with multiple variables simultaneously that not only reveal influential variables, but also reveal the relationship such variables have with each other. This is where Multivariate Analysis (MVA) is finding a much greater role in the analysis of complex bioprocess data.

Much more effort is being put into the discovery and development of biotherapies and personalized medicines. Biopharmaceutical and biotechnology companies are looking for ways to accelerate drug discovery, and through quality and compliance initiatives such as the Food and Drug Administration’s (FDA) current Good Manufacturing Practice (cGMP) and Quality by Design (QbD) principles, and Data Driven Knowledge Discovery, reduce the regulatory approval time and be first to market. This means that data collected throughout the entire product lifecycle must be analyzed and interpreted in order to gain extensive product and process understanding. This, in turn, leads to improved quality, greater confidence in the market for a company’s products and ultimately market capitalization.

It is estimated that the time it takes to bring a new drug or therapy to market is approximately 12 years. This usually involves three phases: discovery, clinical trials and registration.

Coupled with these phases is the development of a suitable manufacturing process that can consistently produce the highest quality product and be compliant with FDA current Good Manufacturing Practice (cGMP) guidance. This includes the development of a formulation that is robust under processing conditions, scale up considerations and technology transfer from facility to facility or even between different types of manufacturing equipment. Each of these phases can be improved and accelerated through the use of the tools of MVA and Design of Experiments (DoE).

Even before data is analyzed, one of the biggest challenges facing the industry is getting this data into a format that is amenable to MVA. Many data collection and agglomeration systems are commercially available for compiling various forms of data and these can be seamlessly integrated into MVA packages so that the vast array of graphical and analytical approaches can be applied to reveal the information it contains. The general statement is data is only data until the information is extracted from it, and from there, information leads to knowledge.

Unlike small molecule drug product development, biotherapies are fundamentally more complex in terms of structure and application and suffer greatly from natural biological variability. For example, isolating and selecting cell cultures or bacterial strains to further develop into future products is aided greatly by the tools of MVA, including the monitoring of the processes (e.g., fermentation reactions) used to produce them. From there, the tools of DoE can be used to devise formulations that stabilize the active component(s) during manufacture and are also useful in product scale-up studies.

Once the candidate therapy (cell cultures, antibody, virus strain, etc.) has been formulated into a stable matrix, MVA can be used to assist in the interpretation of clinical trial data and can even lead to accelerating the lengthy process through a much more comprehensive and overall approach to data analysis, especially when combined with the principles of adaptive designs and the Critical Path Initiative endorsed by FDA.

When the candidate therapy has been approved for market release, the tools of MVA are useful for assessing the success of technology transfer from R&D to production, or from one manufacturing facility to another. In the production environment, MVA is useful for assessing incoming or internally produced raw material quality and characteristics. Combined with rapid spectroscopic (or other characterization methods) control strategies for the real-time monitoring and adjustment of processes within the so-called “design space” can be devised so that proactive quality control can be realized. DoE and MVA are then used in developing robust analytical methods for stability studies and other post-production analyses.

Data collected over time from a manufacturing facility can be modeled to assess batch-to-batch consistency and facilitate continuous improvement (CI) and preventive maintenance and corrective action (CAPA) programs. The entire process is summarized in Figure 1.

Candidate Therapy Discovery: During the initial development of new therapies, there is usually much information available on candidate cultures, antibodies, etc., in respect to their chemical, biological and toxicological properties. Combined with information from origin and other background information, the method of Principal Component Analysis (PCA) provides a key data mining tool for the development scientist to not only classify candidates of similar properties and characteristics, but also to discover unique classes that may be better suited to the treatment of specific conditions.

PCA provides a visual map of the sample groupings, allowing for the more efficient selection of real candidate therapies, but it also provides a map of the input variables and their relationships that cause the samples to group the way they do. Figure 2 provides an example of the outputs of a PCA in the form of the scores and loadings plots. The scores provide a map of the samples and the loadings provide a map of the input variables.

PCA (or more generally MVA) applied to this kind of data is sometimes referred to as Quantitative Structure Activity Relationships (QSAR) and has helped some companies to significantly reduce the time and effort required to isolate suitable candidates for further development.

Formulation of suitable products: Stabilizing the candidate into a suitable matrix for manufacturing and delivery is best approached using DoE and, in particular, excipient screening and mixture designs. Excipient screening designs allow the formulation scientist to select the best components that will preserve the nature of the candidate, while mixture designs allow for the development of the best combination that will not only stabilize the candidate, but also protect it during subsequent manufacturing processes.

Scores and Loadings plots for a candidate selection study

Figure 2: In this example, Source 1 samples have high amounts of impurities whereas Source 3 samples have the highest cell count. As a rule of thumb, variables located outside the inner ellipse are regarded as being important in interpretation of clusters in the Scores plot.

Clinical trials have traditionally been the domain of univariate statistical approaches (in particular clinical statistics) where statistical significance is assessed for parameters such as efficacy and major side effects. The tools of MVA can be used to compliment the findings generated by clinical trial statistics to further confirm and accelerate key findings through this phase of product development.

The ability to incorporate demographic, age, sex and patient history into predictive or exploratory models is a unique feature of the MVA method, and approaches such as the L-PLS model can provide an overall picture of the patient groups, disease markers and the candidate properties to better assess the effect of the therapy on specific patient groups. Figure 3 provides an example of the L-PLS model structure and an example output. 

Through the use of MVA tools for monitoring and controlling bioprocesses, manufacturers worldwide have realized significant cost savings through proactive quality control. During the scale up and technology transfer of a process from R&D to full scale manufacturing, the use of DoE is a critical strategy for assessing the effect of changing process and equipment variables. This allows the definition of the Design Space, which defines the most effective control strategy for the process. Multivariate Statistical Process Control (MSPC) uses multivariate exploratory and predictive models and integrates them into the entire data collection and process control system.

This allows manufacturers to be more innovative in their approach to quality combining in-line process analytics into single or holistic process models that better assess the quality of production than single measurements in isolation. Two particular processes that are commonly used in biotherapy manufacture are fermentation and lyophilization. Some applications of MVA to these are discussed in the following sections.

The L-PLS model and its potential for clinical trial data analysis.

Figure 3: In this example, the variables in green describe the background information of the patients, the variables in blue are the side effects of the formulations (the actual formulations in light blue) and the red dots indicate patient groups. This combined plot is the most informative way of displaying the relationship between the three data tables depicted in the frame above.

For many years manufacturers have been challenged with the development of suitable models for monitoring the progress of batch processes, fermentation being one such process. These batch models aim to establish a process trajectory and associated limits around the trajectory that define the bounds of acceptable product quality.

Methods exist that unfold batch data and use so-called “maturity indices” to model the process. However, the major drawback of these methods is that they assume linear relationships in the processes, which are fundamentally incorrect and have only partially solved the batch problem. Other approaches use time warping to distort the time scale and align batch trajectories. Again, these approaches also suffer fundamentally as they distort the chemistry or biology of the system and hence do not describe the true state of the process.

Relative Time Mapping (RTM) addresses the shortcomings of the previously defined methods by keeping the chemistry/biology of the system intact, while at the same time, providing the usual batch trajectory plots and associated diagnostics that have become synonymous with this type of analysis. Figure 5 provides some typical outputs from a RTM Batch Modeling process.

Whether batch models or traditional Statistical Process Control (SPC) charts are used to assess the progress of a bioprocess, there are many diagnostics available in multivariate models that can be used to determine the onset of process failure.

The term Early Event Detection (EED) is being increasingly used to describe the application of Multivariate Statistical Process Control for the detection of process faults. The diagnostics from these models can be fed back into the manufacturing control systems using protocols such as OPC to automate process adjustments and therefore maximize the quality of the final product.

An extension of MSPC is the use of Hierarchical Models (HM). These models provide an excellent way of classifying the state of discrete phases of processes such as fermentation and adapt to changing conditions as they occur. HMs can be set up as Classification — Classification, Classification — Prediction and Projection — Prediction models which can be adapted to applications such as analysis of raw materials, process monitoring and quality-control applications.

Near Infrared (NIR) spectroscopy has been used for many years with multivariate predictive and exploratory models for the rapid, non-destructive assessment of product quality. One common application of the NIR method is the quantitative analysis of residual moisture in lyophilized products.

Lyophilization is a common method used in the manufacture of biopharmaceutical products as it uses low temperatures to remove residual moisture, thus preserving the structure of the active components and allowing their storage at room temperature. The traditional method of analysis for residual moisture in lyophilized product is Karl Fischer (KF) titration which is a destructive test and can only be applied to a small number of samples.

Replacement of the KF method with NIR not only results in non-destructive testing, but also allows for 100% inspection systems to be put in place. These systems use MVA predictive models to transform the NIR spectrum into a single value for residual moisture (or other properties) and are used to accept and reject product as it is being manufactured.

In one case, a biopharma manufacturer saved about $1 million by using the NIR method combined with PCA to validate the performance of a new freeze dryer. They also developed a quantitative Partial Least Squares Regression (PLSR) model to replace the KF method in the laboratory. This method saves them $1,000 per sample and provides more confidence when releasing the batch to market.

Although initiatives such as Process Analytical Technology (PAT) have been used by many manufacturers globally to assess product and process quality at the point of manufacture, not every process measurement can be replaced at the point of manufacture. Quality Control (QC) operations are still vital in the final release stage of some, if not all, products.

Due to the high variability in many biological assays, DoE and MVA can be used to design and refine the analytical methods used in the QC laboratory and has been successfully applied to the optimization of chromatographic methods, the refinement of sampling procedures and the analysis of complex data produced by mass spectrometers.

Another advantage of combining spectroscopic analysis with MVA methods is in stability studies. Since the NIR method is non-destructive and is sensitive to changes in the product and its matrix, the same sample can be assessed over the entire timeframe of the study. Where applicable, this avoids the destruction of product, and the results are completely representative as the same sample is being assessed each time.

MVA and DoE are fast becoming essential tools for all process development and monitoring applications. Bioprocesses provide an excellent but challenging application area. Modern manufacturing execution systems and control platforms produce a massive amount of data that requires the tools of MVA to fully “data mine” the most important information and make real-time quality decisions.

From raw material analysis to final product release, MVA models can be integrated into the total Quality Management System (QMS), allowing manufacturers to realize the benefits of the Quality by Design (QbD) initiative.

Multivariate data analysis and DoE are powerful tools ideally suited for understanding the complex behavior and relationships in biological systems. These methods can be used across the full biotech product lifecycle, from discovery and development, to scale up, production and quality control. Today’s leading MVA and DoE solutions can be seamlessly integrated with other systems including process equipment, laboratory and spectroscopy instruments, enabling faster and more informed decision making.

Leading biotechnology companies that implement and exploit the power of MVA and DoE can realize substantial benefits including lower development and production costs, improved product quality and compliance, technology transfer, faster time to market and ultimately increased business value.

About the Author

Brad Swarbrick | vice president business development