Simple nearest neighbor analysis meets the accuracy of compound potency predictions using complex machine learning models

Compound potency prediction is a popular application of machine learning in drug discovery, for which increasingly complex models are employed. The general aim is the identification of new chemical entities that are highly potent against a given target. The relative performance of potency prediction models and their accuracy limitations continue to be debated in the field, and it remains unclear whether deep learning can further advance potency prediction. We have analysed and compared approaches of varying computational complexity for potency prediction and shown that simple nearest-neighbour analysis consistently meets or exceeds the accuracy of machine learning methods regarded as the state of the art in the field. Moreover, completely random predictions using different models were shown to reproduce experimental values within an order of magnitude, resulting from the potency value distributions in commonly used compound data sets. Taken together, these findings have important implications for typical benchmark calculations to evaluate machine learning performance. Simple controls such as nearest-neighbour analysis should generally be included in model evaluation. Furthermore, the narrow margin separating the best and completely random potency predictions is unrealistic and requires the consideration of alternative benchmark criteria, as discussed herein.

  • Published in:
    Nature Machine Intelligence
  • Type:
    Article
  • Authors:
    Janela, Tiago; Bajorath, Jürgen
  • Year:
    2022

Citation information

Janela, Tiago; Bajorath, Jürgen: Simple nearest neighbor analysis meets the accuracy of compound potency predictions using complex machine learning models, Nature Machine Intelligence, 2022, 4, 1246--1255, https://www.nature.com/articles/s42256-022-00581-6, Janela.Bajorath.2022a,

Associated Lamarr Researchers

lamarr institute person Bajorath Juergen - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

Prof. Dr. Jürgen Bajorath

Area Chair Life Sciences to the profile