Motif2Mol: Prediction of new active compounds based on sequence motifs of ligand binding sites in proteins using a biochemical language model
In drug design, the prediction of new active compounds from protein sequence data has only been attempted in a few studies thus far. This prediction task is principally challenging because global protein sequence similarity has strong evolutional and structural implications, but is often only vaguely related to ligand binding. Deep language models adapted from natural language processing offer new opportunities to attempt such predictions via machine translation by directly relating amino acid sequences and chemical structures to each based on textual molecular representations. Herein, we introduce a biochemical language model with transformer architecture for the prediction of new active compounds from sequence motifs of ligand binding sites. In a proof-of-concept application on inhibitors of more than 200 human kinases, the Motif2Mol model revealed promising learning characteristics and an unprecedented ability to consistently reproduce known inhibitors of different kinases.
- Published in:
Biomolecules - Type:
Article - Authors:
Yoshimori, Atsushi; Bajorath, Jürgen - Year:
2023
Citation information
Yoshimori, Atsushi; Bajorath, Jürgen: Motif2Mol: Prediction of new active compounds based on sequence motifs of ligand binding sites in proteins using a biochemical language model, Biomolecules, 2023, 13, 5, 833, https://www.mdpi.com/2218-273X/13/5/833, Yoshimori.Bajorath.2023a,
@Article{Yoshimori.Bajorath.2023a,
author={Yoshimori, Atsushi; Bajorath, Jürgen},
title={Motif2Mol: Prediction of new active compounds based on sequence motifs of ligand binding sites in proteins using a biochemical language model},
journal={Biomolecules},
volume={13},
number={5},
pages={833},
url={https://www.mdpi.com/2218-273X/13/5/833},
year={2023},
abstract={In drug design, the prediction of new active compounds from protein sequence data has only been attempted in a few studies thus far. This prediction task is principally challenging because global protein sequence similarity has strong evolutional and structural implications, but is often only vaguely related to ligand binding. Deep language models adapted from natural language processing offer...}}