Multivariate Data Analysis for Biotechnology, Bio-processing

Powerful MVA and DoE methods are giving biotech companies greater insights from complex data

By Brad Swarbrick, vice president business development, CAMO Software

1 of 3 < 1 | 2 | 3 View on one page

The modern biopharmaceutical/biotechnology manufacturing facility contains many sophisticated control, data logging and data archiving systems. Massive amounts of data are collected from sources such as raw materials analysis, process outputs and final quality assessments, which are stored in data warehouses.

The sheer volume of data contained in these warehouses makes it a near impossible task to extract the information using simple charting and univariate methods of analysis. Such complex data requires methods of analysis that can cope with multiple variables simultaneously that not only reveal influential variables, but also reveal the relationship such variables have with each other. This is where Multivariate Analysis (MVA) is finding a much greater role in the analysis of complex bioprocess data.

Much more effort is being put into the discovery and development of biotherapies and personalized medicines. Biopharmaceutical and biotechnology companies are looking for ways to accelerate drug discovery, and through quality and compliance initiatives such as the Food and Drug Administration’s (FDA) current Good Manufacturing Practice (cGMP) and Quality by Design (QbD) principles, and Data Driven Knowledge Discovery, reduce the regulatory approval time and be first to market. This means that data collected throughout the entire product lifecycle must be analyzed and interpreted in order to gain extensive product and process understanding. This, in turn, leads to improved quality, greater confidence in the market for a company’s products and ultimately market capitalization.

It is estimated that the time it takes to bring a new drug or therapy to market is approximately 12 years. This usually involves three phases: discovery, clinical trials and registration.

Coupled with these phases is the development of a suitable manufacturing process that can consistently produce the highest quality product and be compliant with FDA current Good Manufacturing Practice (cGMP) guidance. This includes the development of a formulation that is robust under processing conditions, scale up considerations and technology transfer from facility to facility or even between different types of manufacturing equipment. Each of these phases can be improved and accelerated through the use of the tools of MVA and Design of Experiments (DoE).

Even before data is analyzed, one of the biggest challenges facing the industry is getting this data into a format that is amenable to MVA. Many data collection and agglomeration systems are commercially available for compiling various forms of data and these can be seamlessly integrated into MVA packages so that the vast array of graphical and analytical approaches can be applied to reveal the information it contains. The general statement is data is only data until the information is extracted from it, and from there, information leads to knowledge.

Unlike small molecule drug product development, biotherapies are fundamentally more complex in terms of structure and application and suffer greatly from natural biological variability. For example, isolating and selecting cell cultures or bacterial strains to further develop into future products is aided greatly by the tools of MVA, including the monitoring of the processes (e.g., fermentation reactions) used to produce them. From there, the tools of DoE can be used to devise formulations that stabilize the active component(s) during manufacture and are also useful in product scale-up studies.

Once the candidate therapy (cell cultures, antibody, virus strain, etc.) has been formulated into a stable matrix, MVA can be used to assist in the interpretation of clinical trial data and can even lead to accelerating the lengthy process through a much more comprehensive and overall approach to data analysis, especially when combined with the principles of adaptive designs and the Critical Path Initiative endorsed by FDA.

When the candidate therapy has been approved for market release, the tools of MVA are useful for assessing the success of technology transfer from R&D to production, or from one manufacturing facility to another. In the production environment, MVA is useful for assessing incoming or internally produced raw material quality and characteristics. Combined with rapid spectroscopic (or other characterization methods) control strategies for the real-time monitoring and adjustment of processes within the so-called “design space” can be devised so that proactive quality control can be realized. DoE and MVA are then used in developing robust analytical methods for stability studies and other post-production analyses.

Data collected over time from a manufacturing facility can be modeled to assess batch-to-batch consistency and facilitate continuous improvement (CI) and preventive maintenance and corrective action (CAPA) programs. The entire process is summarized in Figure 1.


Candidate Therapy Discovery: During the initial development of new therapies, there is usually much information available on candidate cultures, antibodies, etc., in respect to their chemical, biological and toxicological properties. Combined with information from origin and other background information, the method of Principal Component Analysis (PCA) provides a key data mining tool for the development scientist to not only classify candidates of similar properties and characteristics, but also to discover unique classes that may be better suited to the treatment of specific conditions.

PCA provides a visual map of the sample groupings, allowing for the more efficient selection of real candidate therapies, but it also provides a map of the input variables and their relationships that cause the samples to group the way they do. Figure 2 provides an example of the outputs of a PCA in the form of the scores and loadings plots. The scores provide a map of the samples and the loadings provide a map of the input variables.

1 of 3 < 1 | 2 | 3 View on one page
Show Comments
Hide Comments

Join the discussion

We welcome your thoughtful comments.
All comments will display your user name.

Want to participate in the discussion?

Register for free

Log in for complete access.


No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments