Drowning in Data?

Follow these five steps to quench your thirst for insights from your data

By Michael Risse, CMO and VP, Seeq

To paraphrase Samuel Taylor Coleridge from “The Rime of the Ancient Mariner,” many pharma manufacturing plants and facilities have data, data everywhere, but not the insights they need. These manufacturers are drowning in data — but asking them to create insights from data using a general-purpose tool like a spreadsheet is akin to throwing an anchor to a drowning man. What they instead need to quench their thirst for actionable insights is a structured, five-step approach using an advanced analytics application designed specifically for this task.

To quench Its thirst for insights, pharma needs a structured, five-step approach using advanced analytics.

The first step is connecting to disparate sources of data. Rare is the pharma facility with all its process data housed in a single storage platform. Much more common is data siloed in process historians, asset management systems, laboratory information systems and other repositories. For best results, access to additional data is often needed as well, such as raw material prices and customer demand metrics. The right advanced analytics application will connect to all these data sources quickly and easily, and won’t require the data to be moved or manipulated at its source.

Once connections are established to all relevant sources of data, the second step is data preparation. It is common for data to be stored in its raw form, with warts and blemishes intact. For example, there may be gaps or outliers in the data, and these should be cleansed. Known and desired conditions may temporarily affect data, such as during clean-in-place operations — another example of data to be cleansed. Once all data is prepared, then context must be added during the third step.

Harking back to our friend Coleridge, giving someone data without context is like offering salt water to someone dying of thirst. What’s needed is a way to add context to the cleansed data.
Consider, for example, a time series data set containing sensor data collected once per second, or about 3.6 million signals per year in the form timestamp:value. Most likely, the user doesn’t want all 3.6 million signals for analysis; he or she instead wants to identify time periods of interest “within” the signal. For example, examination of the data in the context of an asset state, a query, a calculated value, a state of operation, or an external condition such as raw material pricing.

Even with this simple example of one year of data from one signal, there are an infinite number of ways to use the signal in analytics. But this can only be done at “analytics time,” when the user’s intent is defined, and the data is selected for use by defining the relevant time segments. And this example is just one signal: imagine production environments with 10,000 signals, or enterprise roll-ups of sensor data with counts in the millions of signals. Contextualization, at analytics time and in the hands of the subject matter expert, is what transforms process data from a squiggly line on a chart into information of interest for analysis.

Once context is added, the SME can execute the fourth step, using his or her expertise to create insights. Buyer beware if a salesperson pushing an analytics solution claims insights can be automatically derived from data using algorithms without the need for expert guidance. This may work for repeatable and predictable analysis of common data types without complex interactions, such as with facial recognition, but it doesn’t work when analyzing process data, which is in a constant state of change and is affected by many other factors.

Once insights are created, the fifth and final step is sharing these insights within the facility and/or the entire organization so decisions can be made and actions implemented to improve operational outcomes. Therefore, the advanced analytics application should provide the SMEs with a way to easily and quickly publish and share results with others. The published insights should be immediately apparent, with a means provided to drill down and see just how they were derived.

Following these five steps will produce the desired insights, resulting in more throughput, less downtime and improved quality. The advantages of data collection are there, and closer than imagined by leveraging innovations in data science, analytics and open source offerings. Insights ho!