Beyond the Data Warehouse

Its time to shatter the myth of one-stop process data accessdynamic mapping engines allow users easier access to both discrete and time series data.

Joseph S. Rothman

July 22, 2009

6 min read

There was a time when any business process improvement program demanded a data warehouse. These warehouses, which aim to put data “all in one place,” can be useful for identifying trends or causes of variability. However, they typically cannot accommodate critical process-related data types without severe sacrifices in performance and flexibility.

In practice, organizations that have embarked upon installing a warehouse for process-related data have wound up needing two separate warehouses—a classic warehouse for discrete data (typically maintained by the IT department), and a historian, managed by the engineering department, to house continuous (or “time series”) data. Users who needed both types of data, for instance, to tie a batch failure to a specific equipment maintenance problem, needed IT assistance to bring these two types of data together.

Facing mandates for increased process knowledge, and the need to access data from any and all manufacturing sources, including online analyzers, a growing number of drug industry professionals find that traditional warehouses aren’t enough to enable enterprise manufacturing intelligence (EMI). In some cases, they may add cost and may actually slow data access.

Traditional data warehouses are costly to develop and maintain, and the extract/transfer/load (ETL) processes they require are expensive.

Furthermore, traditional data warehouses leave it to the user to define the context of the information required. This can be a significant hurdle in biotech, for instance, where “splits” and “recombinations” lead to extremely variable genealogy conditions.

Getting a handle on batch-specific genealogy is essential to can ensuring relevant analysis. However, incorporating genealogy considerations into data set-up can lead to datasets with over 50,000 rows of data. If this data must be manually or even “semi-automatically” organized, such as into a spreadsheet, it is almost impossible to guarantee integrity.

Enterprise manufacturing intelligence (EMI) software offers an alternative. Such systems are driven by a lightweight data access and aggregation platform, a dynamic mapping engine (DME), that gets individual users closer to the data they need for day-to-day analysis and reporting.

This lightweight platform is paired with sophisticated data modeling that creates more efficient data records. The result is near real-time access to data without the IT burden of designing, deploying and maintaining a physical warehouse. Thus, it has a much lower impact on the IT staff, freeing their time for other projects.

A lightweight platform makes it easier to go onto the “next step” when users typically wouldn’t have had time to do that within the constraints of a data warehouse. Therefore, the right lightweight data access and aggregation platform should be selected using these requirements that are missing in traditional data warehouse solutions:
     1) Data Context—capture all types of data (continuous, discrete, replicate)
     2) Batch Context—provide the ability to compare batches
     3) Batch Genealogy—quickly make upstream and downstream comparisons

Empowering Users

A DME solution as part of a lightweight data platform can fuse any type of data from any data source. Users can, on their own, access manufacturing process data on demand and just point and click to create an organized virtual composite of discrete, replicate and continuous data. The data context is meaningful to user needs, organized by batch, data type, genealogy, or any other context defined by the user.
Most importantly, a DME doesn’t require users to know SQL. Generic views can be combined at the time the user requests the data to provide multiple comparison contexts, including (but not limited to) product-to-product, equipment-to-equipment, process-to-process, site-to-site, etc., through the following steps:

Organize. To create a type of “data dictionary,” the DME presents available data elements within a hierarchical view displayed on a desktop; these views are based on manufacturing identifiers (IDs) determined by the user. IDs might be process steps, batch outputs, campaigns or sites, and they designate data type. The hierarchy might, for example, include fermentation, filtration and purification steps for a particular drug, drilling down to pH level, lot, quantity produced, mixer speeds and other valuable information.

Identify. The DME creates a data mart, or a user-requested subset of data which is available via ad-hoc requests to the data source. Using such an analysis group might, for example, allow the user to isolate and examine fermentation and filtration steps for a particular batch.

Data Retrieval. Parameter set selection is the first step in data retrieval, for which users select date ranges and the software generates the SQL queries to generate a list of batches corresponding to the defined parameter set. The user then can select batches from the retrieved list for each parameter selected in the identified data mart. The DME automatically generates SQL that take into account four key contexts:

1) Data source—where the data is coming from
2) Data type—discrete, replicate, continuous or event data
3) Original data organization—batch, time, sample
4) Process genealogy—understanding the impact stream parameters have on downstream outcomes

Results are shown in a table view with retrieved, multi-dimensional data presented in rows and columns similar to an Excel spreadsheet. With results captured by the DME, users can easily export data into analysis, reporting or dash-boarding applications.

Practical Benefits

The end benefits of using a lightweight data platform include a lower impact to the IT department through substantial time and cost savings for initial system design and system upkeep and monitoring. Day-to-day system management is more straightforward and less time consuming than with a traditional data warehouse. Ultimately, manufacturing benefits from having a platform designed to handle user-centric analytical and reporting needs that enables analysis across all types of data.

Such a solution can go beyond typical descriptive analysis, such as dashboard displays of KPIs, to enable investigational or discovery analysis. Thus, the organization can go beyond KPI reporting into discovery of the Critical Process Parameters (CPPs) and Critical Quality Attributes (CQAs) that must be specified and controlled in order to achieve high-performance, reliable manufacturing processes. Relevant-time process intelligence enables greater process understanding that can increase predictability, decrease variability and, therefore, increase yields and positive process outcomes.

A DME solution ultimately benefits both users and IT by:

Automating SQL generation, freeing IT staff, and also eliminating the need for manufacturing professionals to learn SQL.
Offering flexibility to change data for analysis and the ability to generate up to 1,000 ad-hoc data combinations for specific investigations.
Returning data in seconds or minutes—versus the days or weeks it might take to compile in spreadsheets or define new data cubes in a data warehouse across multiple systems that tap into disparate data sources.
Correlating upstream and downstream data to account for batch genealogy, leading to important information that can point to the root cause of a batch problem.

About the Author

Joseph S. Rothman is vice president of professional services for Aegis Analytical Corp., where he manages the company’s Professional Services and Engineering departments. Rothman has over three decades of experience in the computer industry, including product design and development, product management, sales and sales management, project and program management, technology consulting and management consulting.