
More interesting than AI-based error prediction—for instance, in industrial production—is understanding why errors occur in the first place, or why a specific error occurred, and what actions should be taken to remedy and prevent further errors.
The search for weaknesses or causes of errors is known as “root cause analysis.” The goal of such an analysis in practice is to:
- Reduce or avoid downtime in production
- Minimize quality defects in manufactured products
- Identify unknown causal relationships and subsequently use them to optimize production plants and processes
Root cause analysis is a continuous task and a part of the continuous improvement process of a company, given the ever-increasing demands for quality and optimization.

In modern, digital production facilities, vast amounts of data are recorded, which can no longer be analyzed with simple means.
AI methods can help when it comes to finding clues about possible causes of errors in these large data sets. It is important that sufficient error cases are documented in this data so that causal relationships can be found in a statistically valid way. Therefore, not the causes of rare errors are considered, but the reasons for everyday errors, e.g., production disruptions.
It can’t be done without application experts
Of course, every company has experts who “keep things running,” meaning they solve problems, identify weaknesses, and optimize production. Even if huge amounts of data are now available, we must not succumb to the misconception that Artificial Intelligence could make the knowledge and experience of application experts obsolete. The complexity of production facilities and critical error situations—and therefore the difficulty level—is extremely high (the simple errors are usually already resolved without AI). This is also reflected in the data situation: there are usually many interrelated influencing factors and a relatively small number of error cases. These and other factors make data analysis difficult, so the involvement of application experts is essential for the best possible result. From a list at least twice as long, here are just three important aspects:
- Practical heuristics for error correction can be used to reduce task complexity.
- Expertise that is not present in the data can be incorporated.
- Application experts must validate AI results to distinguish real causes from other correlated events (e.g., other error symptoms).
The goals and forms of collaboration between application experts and data scientists are extensively discussed in our whitepaper (in German).

The special challenge: time series
Production processes are physical operations that extend over multiple processing steps and a longer period of time. They are typically monitored at discrete time intervals, and the recorded data are aggregated into long time series that document the process and need to be analyzed. These time series are typically high-dimensional (e.g., combining values from many sensors) and can span several hundred time steps. Such time series contain countless possible patterns. The question of which of these patterns might be responsible for the occurrence of an error is akin to finding a needle in a haystack.

Fraunhofer IAIS has developed a new technical approach for AI-based root cause analysis on time series data, based on recent research results and project experiences.
This approach uses so-called deep neural networks to find error cause prototypes in multidimensional time series. Such a prototype is a short time series representing the temporal course of a few (e.g., 3 to 5) measurement points, and which the neural network associates with a series of errors in a similar form.

Example of an error cause prototype
Typically, between 10 and 20 such error cause prototypes are found by the neural network. These prototypes must then be discussed and interpreted with application experts using case examples to understand the error cause. In the process, the complexity of the prototype (especially the number of measurement points and the length of the time series to be considered) is reduced. From the understanding gained, instructions can be created on how to respond appropriately when this error occurs. Furthermore, constructive measures can be taken to minimize the occurrence of this error in the future.
Conclusion
Experience from various projects has shown that a successful AI-based root cause analysis usually only succeeds when leading AI methods are combined with methods for integrating expert knowledge. For the time series data typically generated in production processes, Fraunhofer IAIS has developed a specialized analysis method. This makes it possible to evaluate automatically recorded production data for the analysis of highly complex production processes, allowing the identification of causal relationships that would have remained hidden using conventional root cause analysis methods.