Interested in linking to "Reviewing the Basics of Stats"?
You may use the Headline, Deck, Byline and URL of this article on your Web site. To link to this article, select and copy the HTML code below and paste it on your own Web site.
By Kate DeRoche Lusczakoski, Ph.D., and Aaron Spence, M.A., Aegis Analytical

To summarize, the bigger your sample size, the more likely you are to find statistically significant results, and the smaller your sample size, the less likely you are to find statistically significantly findings. As a quick rule of thumb, a sample size should be somewhere between 30 and 500 observations. In manufacturing you often can’t change sample size; however, you can change your alpha level and interpret your results in light of the sample size that you have. In the statistical world, the topic of reviewing confidence levels and sample size to interpret your results in light of the conditions is referred to as statistical power.
2. What sampling method was used?
Sampling techniques are a critical component of manufacturing analy-tics. There are two general categories of sampling methods: (1) random sampling (representative sampling) and (2) nonrandom sampling (purposeful sampling). Determining what type of sampling technique to use is dependent on what you are examining and how you would like to generalize your statistical inferences. As a good consumer of statistics, you should inquire about the sampling method — was it random or non-random? If random sampling was used, how did you select the samples at random? Watch out for answers like “the operator randomly selected them,” because this may or may not be a “truly random” sample. If non-random sampling was used, then ask about the rationale for how the sample was collected. For example, was it collected from the end of the process because that was the focus of an investigation or because that is the only data you had access to?
Also, does the non-random sampling method fit the purpose of the analysis?
Regardless of what type of sampling method is applied, the sample used dictates the frame for the interpretation of the statistical analysis. If data was gathered in 2012, then the statistical interpretation should be restricted to only 2012. If data was only gathered from a single site, then the interpretation of the statistical results should be restricted to that site. While sampling methods can become extremely complex, understanding the rationale for the sampling method selection is critical to the proper use of statistics.
3. What statistical test did you use?
Determining the most appropriate inferential statistical test typically depends on three elements related to what you are asking the data to explain or predict. Before diving into the three elements, however, it is important to grasp the concepts of independent and dependent variables. An independent variable is one that affects an outcome (i.e., changes the dependent variable). A dependent variable is an outcome variable whose value depends on other variables. For example, if you believe large amounts of chocolate pudding increase happiness, the amount of chocolate pudding is your independent variable and happiness is your dependent variable. After identifying your independent and dependent variables, you can go through these three steps to determine the most appropriate statistical test:
• Are you interested in looking at a difference in parameters or a relationship among parameters?
• What type of parameter is your dependent variable(s)? (numeric or categorical)
• What type of parameter is your independent variable(s)? (numeric or categorical)
4. Did you check the statistical assumptions?
Statistics rely on assumptions, and making incorrect assumptions about your data can lead to errors in conclusions due to incorrect interpretation of the statistics. Linearity, independent observations, normality and equal variance (LINE) are assumptions of commonly applied statistical tests (parametric statistics), and ignoring these assumptions can lead to misinterpretation of results.
Comparing yield across three manufacturing sites with an ANOVA test, for example, we assume yield is normally distributed, the observations are independent, and there is similar variance in yield between all of the sites. If any of these ANOVA assumptions are violated, then results may be incorrect.
Statistical errors will continue to run rampant in life sciences manufacturing with today’s point-and-click, data-rich environments making access to statistics much easier. Having a sound working knowledge of statistical best practices and understanding commonly misapplied areas of statistics (sampling, process capability, statistical process control and ANOVA) will better inform your organization’s decisions. Proper interpretation by a trained statistician is always most valuable, but — as a minimum standard — a smart data scientist offers a check-and-balance by asking important questions that help avoid inaccurate conclusions.
About the Authors
Kate DeRoche Lusczakoski, Ph.D., is the Manager of Business Consulting with Aegis Analytical Corp. in Lafayette, Colo., with a doctorate degree in applied statistics/research methods. (klusczakoski@aegiscorp.com).
Aaron Spence, M.A., is an Analytics Specialist with Aegis Analytical Corp. in Lafayette, Colo., with a master’s degree in health psychology (aspence@aegiscorp.com).
PharmaManufacturing.com is the site for knowledge, news and analysis for manufacturing and other professionals working in the pharmaceutical, biopharmaceutical and biotech industries.