Prof. Dr. Jürgen Bajorath and his team illustrate the functioning of machine learning models in drug research in the journal “Cell Reports Physical Science.”
Which chemical compounds have the potential to become effective drugs? This question is increasingly being answered through the use of machine learning and artificial intelligence. A recent study by Jannik P. Roth and Prof. Dr. Jürgen Bajorath, a professor at the University of Bonn and at the Bonn-Aachen-International Center for Information Technology (b-it), as well as the Lamarr Area Chair for Life Sciences, provides deeper insights into the functioning of these technological approaches.
How do machine learning models make their predictions?
In the development of new drugs, machine learning models are of central importance. They predict whether certain chemical compounds will bind to pharmaceutical target proteins and thereby elicit a desired effect. However, these models are often difficult to understand, as their decisions are not directly interpretable. In their study, the researchers developed a machine learning model system that can formally explain and compare the predictions of an algorithm. This helps to identify the features of a drug molecule that are crucial for the prediction.
Key Findings and Challenges
The study showed that similar algorithmic models can make almost identical predictions, but these are based on different assumptions. This complicates the interpretation of the results and reduces their practical utility. To address this challenge, the team developed a new method for calculating Shapley values, a concept from game theory. These values quantify the contribution of individual molecular features to the final prediction, enabling precise analysis and comparability.
Practical Applications and Future Perspectives
The new method for calculating Shapley values is not only relevant for pharmaceutical research but also applicable in other fields. The central finding of the study underscores that identical predictions can be achieved through different model pathways, making interpretation of the results more difficult. Future work will focus on improving the comparability and interpretability of machine learning models to further optimize their use in drug development.
Publication
Jannik P. Roth, Jürgen Bajorath. “Machine Learning Models with Distinct Shapley Value Explanations for Chemical Compound Predictions Decouple Feature Attribution and Interpretation,” Cell Reports Physical Science. DOI: 10.1016/j.xcrp.2024.102110.
Interdisciplinary Research Area Life Sciences
The interdisciplinary research area of Life Sciences at the Lamarr Institute aims to integrate machine learning (ML), explainable artificial intelligence (AI), and data science with life science disciplines such as drug and medical research. The concept of Triangular AI forms the basis of the Life Sciences area. ML and other AI methods are applied to heterogeneous bioscientific data, which provide different scientific contexts, and leverage knowledge from various fields to align the prediction models with experimental design.