Pharma Wrestles With Data Integrity Issues; Some Insights From John Avellanet

July 23, 2007
Not to harp on this point, but the drug industry should thank certain people for bringing a certain legal complaint involving a Swiss multinational whose name starts with N to light, because the problems listed in that document (whatever happens with that particular case) are difficult ones and a great many drug companies are wrestling with them today. IT consultant John Avellanet, head of Cerulean Associates, LLC,  shared his insights on drug safety data management in pharma. I asked him specifically (and generically) about two allegations that were raised in David Olagunju legal complaint: best practices for validating data to comply with 21 CFR Part 11 and whether hard coding of randomization codes is really a problem. His responses are enlightening.   Mr. Avallanet is off on Wednesday to give an expert briefing on Data Integrity and 21 CFR Part 11 to FDA.  We'll publish a more extensive interview and case history on our web site and in our next issue, and we're very pleased that John will be writing an article for an upcoming edition of Pharmaceutical Manufacturing. Please note- John's comments are generic and describe some of what he has seen in pharma and have absolutely no connection whatsoever with the legal complaint, the company involved in that complaint or any of its affiliates. Here's a sound bite.  PM -  What mistakes have you seen that pharma companies typically when validating data to comply with Part 11? JA - There are six core steps to ensuring Part 11 compliance for clinical data (preclinical toxicology reports come from the public literature).  The mistakes run the gamut from simple oversights to multi-million dollar snafus. The Top three mistakes that I almost always consistently see:
  • Lack of clarity and accountability for data integrity. To date, in every company I've dealt with, "someone else" is always held accountable for data integrity.  Records Management says it's an IT issue because IT holds the keys to the computer systems the data is stored on; IT says it's a Records Management issue because they support systems, not information; the scientists claim that they are responsible for it and so they want to burn it on CD and stick it in their desk drawer because they just know IT won't be able to find it down the road when the scientist might need it; management assumes that just like quality is accountable for the paper quality systems files (SOPs and the like), they must be accountable for the electronic stuff too"¦and so on, and so on, and so on.
  • A division between R&D/preclinical data and the clinical and production data.  From the FDA's perspective, this makes no sense (think Quality by Design).  From a cynical IT perspective and from a big consulting firm perspective, this makes it easy.  Either you deal in the R&D world or you don't.  If you don't deal with R&D/preclinical, it's not your problem. 
There's a Big Pharma company right now that is just beginning to grasp this nightmare; they hired a big consulting firm to handle all their data integrity issues globally, only to find out that the contract specifically excludes anything not already in clinical or in production (the big consulting firm doesn't deal in R&D and preclinical).  You can just imagine where this is heading.
  • Inability to "translate" and achieve mutual understanding between functional units. More and more, I'm convinced this is one of the root causes of the problem.  IT can't understand Compliance, who can't understand the Scientists, who can't understand Purchasing, who can't understand Senior Management"¦you get the picture. It's not that they all need to be in mutual agreement -  they won't be - but they do need to figure out how to be in mutual understanding; all marching to the same beat of the drum.
PM - Is the "hard coding" of randomization codes a bad thing? What influence would this practice have on results? Are there instances where it is appropriate JA - The short answer?  Risk of data manipulation rises; decline of proof of patient safety and product efficacy. At its core, automated randomization tests multiple different variables (patients, individual results, genetic makeup, etc.) to make comparisons and draw conclusions from.  Because you are randomly assessing the whole group, when something you don't expect occurs, then you know that is statistically important. As a simple example, if you look at all the patients and 15% had heart attacks 30 minutes after taking the drug, but no one else had any heart attacks at all, then - assuming the patients were all healthy - you could say your drug has a 15% of causing a heart attack within 30 minutes.  You'd then want to do further studies to find out why, what is the risk after 30 minutes, etc., etc. Now, let's say that I got really good results from an initial clinical trial where I randomized my results, but discovered afterward that if I give my new drug to people who wear glasses, two months after the trial is over, they go blind.  What would the data show if I went back to hard-code my randomization codes to only be those that excluded people who wore glasses?  Results would be great and it would appear that all is good in the world. Even if it's not so cynically after-the-fact, once you hard code random sampling, it's no longer random and now a number of things happen that reduce the credibility of the results: 1. You could cherry-pick either test cases, patients or other variables to get the results you want. 2. Even if you don't cherry-pick, you end up with a set of comparisons, the results of which, if they are not the same,   can't help you determine if that difference is important or not (was it due to the drug? some other variable?) because you only tested a set number.  With randomization, you draw better correlations, analyses and conclusions. 3. You introduce two human risks:  human error (when the guy is originally hard-coding) and then human temptation (such as when a company's internal reviewer is under pressure to make sure that new drug has good results"¦his or her job might depend on such a thing). 4. You also make it difficult for someone to be able to reproduce your results unless they not only use the same methods, but now the same variables - for example, the same patient genetic makeup. Hope that some readers find this information useful and that this makes up for Sunday's silly Potter post. In the meantime, please write in and share your insights and experience. -AMS
Not to harp on this point, but the drug industry should thank certain people for bringing a certain legal complaint involving a Swiss multinational whose name starts with N to light, because the problems listed in that document (whatever happens with that particular case) are difficult ones and a great many drug companies are wrestling with them today. IT consultant John Avellanet, head of Cerulean Associates, LLC,  shared his insights on drug safety data management in pharma. I asked him specifically (and generically) about two allegations that were raised in David Olagunju legal complaint: best practices for validating data to comply with 21 CFR Part 11 and whether hard coding of randomization codes is really a problem. His responses are enlightening.   Mr. Avallanet is off on Wednesday to give an expert briefing on Data Integrity and 21 CFR Part 11 to FDA.  We'll publish a more extensive interview and case history on our web site and in our next issue, and we're very pleased that John will be writing an article for an upcoming edition of Pharmaceutical Manufacturing. Please note- John's comments are generic and describe some of what he has seen in pharma and have absolutely no connection whatsoever with the legal complaint, the company involved in that complaint or any of its affiliates. Here's a sound bite.  PM -  What mistakes have you seen that pharma companies typically when validating data to comply with Part 11? JA - There are six core steps to ensuring Part 11 compliance for clinical data (preclinical toxicology reports come from the public literature).  The mistakes run the gamut from simple oversights to multi-million dollar snafus. The Top three mistakes that I almost always consistently see:
  • Lack of clarity and accountability for data integrity. To date, in every company I've dealt with, "someone else" is always held accountable for data integrity.  Records Management says it's an IT issue because IT holds the keys to the computer systems the data is stored on; IT says it's a Records Management issue because they support systems, not information; the scientists claim that they are responsible for it and so they want to burn it on CD and stick it in their desk drawer because they just know IT won't be able to find it down the road when the scientist might need it; management assumes that just like quality is accountable for the paper quality systems files (SOPs and the like), they must be accountable for the electronic stuff too"¦and so on, and so on, and so on.
  • A division between R&D/preclinical data and the clinical and production data.  From the FDA's perspective, this makes no sense (think Quality by Design).  From a cynical IT perspective and from a big consulting firm perspective, this makes it easy.  Either you deal in the R&D world or you don't.  If you don't deal with R&D/preclinical, it's not your problem. 
There's a Big Pharma company right now that is just beginning to grasp this nightmare; they hired a big consulting firm to handle all their data integrity issues globally, only to find out that the contract specifically excludes anything not already in clinical or in production (the big consulting firm doesn't deal in R&D and preclinical).  You can just imagine where this is heading.
  • Inability to "translate" and achieve mutual understanding between functional units. More and more, I'm convinced this is one of the root causes of the problem.  IT can't understand Compliance, who can't understand the Scientists, who can't understand Purchasing, who can't understand Senior Management"¦you get the picture. It's not that they all need to be in mutual agreement -  they won't be - but they do need to figure out how to be in mutual understanding; all marching to the same beat of the drum.
PM - Is the "hard coding" of randomization codes a bad thing? What influence would this practice have on results? Are there instances where it is appropriate JA - The short answer?  Risk of data manipulation rises; decline of proof of patient safety and product efficacy. At its core, automated randomization tests multiple different variables (patients, individual results, genetic makeup, etc.) to make comparisons and draw conclusions from.  Because you are randomly assessing the whole group, when something you don't expect occurs, then you know that is statistically important. As a simple example, if you look at all the patients and 15% had heart attacks 30 minutes after taking the drug, but no one else had any heart attacks at all, then - assuming the patients were all healthy - you could say your drug has a 15% of causing a heart attack within 30 minutes.  You'd then want to do further studies to find out why, what is the risk after 30 minutes, etc., etc. Now, let's say that I got really good results from an initial clinical trial where I randomized my results, but discovered afterward that if I give my new drug to people who wear glasses, two months after the trial is over, they go blind.  What would the data show if I went back to hard-code my randomization codes to only be those that excluded people who wore glasses?  Results would be great and it would appear that all is good in the world. Even if it's not so cynically after-the-fact, once you hard code random sampling, it's no longer random and now a number of things happen that reduce the credibility of the results: 1. You could cherry-pick either test cases, patients or other variables to get the results you want. 2. Even if you don't cherry-pick, you end up with a set of comparisons, the results of which, if they are not the same,   can't help you determine if that difference is important or not (was it due to the drug? some other variable?) because you only tested a set number.  With randomization, you draw better correlations, analyses and conclusions. 3. You introduce two human risks:  human error (when the guy is originally hard-coding) and then human temptation (such as when a company's internal reviewer is under pressure to make sure that new drug has good results"¦his or her job might depend on such a thing). 4. You also make it difficult for someone to be able to reproduce your results unless they not only use the same methods, but now the same variables - for example, the same patient genetic makeup. Hope that some readers find this information useful and that this makes up for Sunday's silly Potter post. In the meantime, please write in and share your insights and experience. -AMS
About the Author

pharmamanufacturing | pharmamanufacturing