# What Your ICH Q8 Design Space Needs: A Multivariate Predictive Distribution

## Multivariate predictive distribution quantifies the level of QA in a design space. "Parametric bootstrapping" can help simplify early analysis and complement Bayesian methods.

The ICH Q8 core definition of design space is by now somewhat familiar: “The multidimensional combination and interaction of input variables (e.g., material attributes) and process parameters that have been demonstrated to provide assurance of quality” [1]. This definition is ripe for interpretation. The phrase “multidimensional combination and interaction” underscores the need to utilize multivariate analysis and factorial design of experiments (DoE), while the words “input variables (e.g., material attributes) and process parameters” remind us of the importance of measuring the right variables.

However, in presentations and articles discussing design space, not much focus has been given to the key phrase, “assurance of quality”. This does not seem justified, given that guidance documents such as ICH Q8, Q9, Q10, PAT, etc. are inundated with the words “risk” and “risk-based.” For any ICH Q8 design space constructed, surely the core definition of design space begs the question, “How much assurance?” [2]. How do we know if we have a “good” design space if we do not have a method for quantifying “How much assurance?” in a scientifically coherent manner?

# The Flaws of Classical MVA

Classical multivariate analysis and DoE methodology fall short of providing convenient tools to allow one to answer the question “How much assurance?” There are two reasons for this. One is that multivariate analysis and DoE have historically focused on making inferences primarily about response means. But simply knowing that the response means of a process meet quality specifications is not sufficient to allow one to conclude that the next batch of drug product will meet specifications. One reason for this is that we always have batch-to-batch variation. Building a design space based upon overlapping mean responses will result in one that is too large, harboring operating conditions with a low probability of meeting all of the quality specifications. Just because each of the process mean quality responses meet specification does not imply that the results of the next batch will do so; there will always be variation about these means.

However, if we can quantify the entire (multivariate) predictive distribution of the process quality responses as a function of “input variables (e.g., material attributes) and process parameters”, then we can compute the probability of a future batch meeting the quality specifications. The multivariate predictive distribution of the process quality responses incorporates all of the information about the means, variation, and correlation structure among the response types.

Another, more technically subtle reason, that classical multivariate analysis and DoE methodology do not provide straightforward tools to construct a proper design space is that, beyond inference about response means, they are oriented towards construction of prediction intervals or regions. For a process with multiple critical quality responses, it is awkward to try to use a classical 95% prediction region (which is not rectangular in shape) to compare against a (rectangular) set of specifications for multiple quality responses. On the other hand, using individual prediction intervals does not take into account the correlation structure among the quality responses. What is needed instead is a “multivariate predictive distribution” for the quality responses. The proportion of this predictive distribution that sits inside of the rectangular set of quality specifications is then simply a quantification of “how much assurance”.

# The Stochastic Nature of Our Processes

Complex manufacturing processes are inherently “stochastic processes”. This means that while such process may have underlying mechanistic model relationships between the input factors and process responses, nonetheless such relationships are embedded with random variation. Conceptually, for many complex manufacturing processes, even if “infinitely” accurate measurement devices could be used, such processes would still produce quality responses that vary from batch to batch and possibly within batch. This is why classical multivariate analysis and DoE methodology, which focuses on inference for response means, present an insufficient tool set. This is not to say that such methods are not necessary; indeed they are. However, we need to better understand the multivariate stochastic nature of complex manufacturing processes in order to quantify “how much assurance” relative to an ICH Q8 design space.

The concept of a multivariate distribution is useful for understanding complex manufacturing processes, with regard to both input variables and response variables. Figure 1 shows a hypothetical illustration of the relationships among various input variables and response variables. Notice that some of the input variables and the response variables are described by multivariate distributions. In Figure 1 we have a model which describes the relationships between the input variables, the common cause process variability and the quality responses. This relationship is captured by the multivariate mathematical function ** f**=(f

_{1},…,f

_{r}), which maps various multivariate input variables,

**to the multivariate quality response variable,**

*x, z, e, θ***=(Y**

*Y*_{1},…,Y

_{r}). Here,

**=(x**

*x*_{1},…,xk) lists the controllable process factors (e.g. pressure, temperature, etc.), while

**=(z**

*z*_{1},…,z

_{h}) lists process variables which are noisy, such as input raw materials, process set-point deviations, and possible ambient environmental conditions (e.g. humidity).

The variable ** e**=(e1,…,er) represents the common-cause random variability that is inherent to the process. It is natural that z and e be represented by (multivariate) distributions that capture the mean, variation about the mean, and correlation structure of these random variables. The parameter

**= (θ**

*θ*_{1},...θ

_{p}) represents the list of unknown model parameters. While such a list can be thought of as composed of fixed unknown values, for purposes of constructing a predictive distribution, it may be easier to describe the uncertainty associated with these unknown model parameters by a multivariate distribution. The function f then transmits the uncertainty in the variables

**to the response variables,**

*z, e, and θ***. In other words, the “input distributions” for**

*Y***combine to produce the predictive distribution for**

*z, e, and θ***through the process model function**

*Y***.**

*f*# Join the discussion

We welcome your thoughtful comments.

All comments will display your user name.

#### Comments

No one has commented on this page yet.

RSS feed for comments on this page | RSS feed for all comments