Combining a Chemical Language Model and the Structure–Activity Relationship Matrix Formalism for Generative Design of Potent Compounds with Core Structure and Substituent Modifications
In medicinal chemistry, compound optimization relies on the generation of analogue series ({AS}) for exploring structure–activity relationships ({SARs}). Potency progression is a critical criterion for advancing {AS}. During optimization, a key question is which analogues to synthesize next. We introduce a new computational methodology for the extension of {AS} with potent compounds containing both core structure and substituent modifications at multiple sites, which has been reported for the first time. The approach combines a transformer chemical language model ({CLM}) with a {SAR} matrix ({SARM}) methodology that identifies and organizes structurally related {AS}. Therefore, the {SARM} approach was expanded to cover multisite {AS}. Consensus series extracted from {SARMs} representing a potency gradient served as input for {CLM} training to extend test {AS} with potent analogues. Different model variants were derived and investigated. Both general and fine-tuned models correctly predicted known potent analogues at high positions in probability-based compound rankings and chemically diversified {AS} through core structure modifications of the generated candidate compounds and substituent replacements at multiple sites.
- Published in:
Journal of Chemical Information and Modeling - Type:
Article - Authors:
Chen, Hengwei; Bajorath, Jürgen - Year:
2024
Citation information
Chen, Hengwei; Bajorath, Jürgen: Combining a Chemical Language Model and the Structure–Activity Relationship Matrix Formalism for Generative Design of Potent Compounds with Core Structure and Substituent Modifications, Journal of Chemical Information and Modeling, 2024, November, American Chemical Society, https://pubs.acs.org/doi/10.1021/acs.jcim.4c01781, Chen.Bajorath.2024a,
@Article{Chen.Bajorath.2024a,
author={Chen, Hengwei; Bajorath, Jürgen},
title={Combining a Chemical Language Model and the Structure–Activity Relationship Matrix Formalism for Generative Design of Potent Compounds with Core Structure and Substituent Modifications},
journal={Journal of Chemical Information and Modeling},
month={November},
publisher={American Chemical Society},
url={https://pubs.acs.org/doi/10.1021/acs.jcim.4c01781},
year={2024},
abstract={In medicinal chemistry, compound optimization relies on the generation of analogue series ({AS}) for exploring structure–activity relationships ({SARs}). Potency progression is a critical criterion for advancing {AS}. During optimization, a key question is which analogues to synthesize next. We introduce a new computational methodology for the extension of {AS} with potent compounds containing...}}