DeepAS – Chemical language model for the extension of active analogue series

In medicinal chemistry, hit-to-lead and lead optimization efforts produce analogue series (ASs), the analysis of which is of central relevance for the exploration and exploitation of structure–activity relationships (SARs) and generation of candidate compounds. The key question in any chemical optimization effort is which analogue(s) to generate next, for which computational support is typically provided through QSAR analysis and compound potency predictions. In this study, we introduce a new chemical language model for analogue design via deep learning. For this purpose, ASs comprising active compounds are ordered according to increasing potency and the chemical language model predicts preferred R-groups for new analogues on the basis of ordered R-group sequences. Hence, consistent with the principles of deep models for natural language processing, analogues with new R-groups are predicted based upon conditional probabilities taking preceding groups into account. This implicitly accounts for the potency gradient captured by an AS and detectable SAR trends, providing a new concept for analogue design. Herein, we report the AS-based chemical language model, its initial evaluation, and exemplary applications.

  • Published in:
    Bioorganic & Medicinal Chemistry
  • Type:
    Article
  • Authors:
    Yoshimori, Atsushi; Bajorath, Jürgen
  • Year:
    2022

Citation information

Yoshimori, Atsushi; Bajorath, Jürgen: DeepAS – Chemical language model for the extension of active analogue series, Bioorganic & Medicinal Chemistry, 2022, 66, 116808, https://www.sciencedirect.com/science/article/pii/S0968089622002000?via=ihub, Yoshimori.Bajorath.2022a,

Associated Lamarr Researchers

lamarr institute person Bajorath Juergen - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

Prof. Dr. Jürgen Bajorath

Area Chair Life Sciences to the profile