SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes?
Digital Discovery Pub Date: 2023-01-30 DOI: 10.1039/D2DD00107A
Abstract
Deep learning models based on NLP, mainly the Transformer family, have been successfully applied to solve many chemistry-related problems, but their applications are mostly limited to chemical reactions. Meanwhile, solvation is an important concept in physical and organic chemistry, describing the interaction of solutes and solvents. In this study, we introduced the SolvBERT model, which reads the solute and solvent through the SMILES representation of their combination. SolvBERT was pre-trained in an unsupervised learning fashion using a large database of computational solvation free energies. The pre-trained model could be used to predict the experimental solvation free energy or solubility, depending on the fine-tuning database. To the best of our knowledge, this multi-task prediction capability has not been observed in previously developed graph-based models for predicting the properties of molecular complexes. Furthermore, the performance of our SolvBERT in predicting solvation free energy was comparable to the state-of-the-art graph-based model DMPNN, mainly due to the clustering feature of the pre-training phase of the model, as demonstrated using the TMAP visualization algorithm. Last but not least, our SolvBERT outperformed the recently-developed GNN–Transformer hybrid model, GROVER, in predicting a set of experimentally evaluated solubility data with out-of-sample solute–solvent combinations.
Recommended Literature
- [1] Alternative donor substrates for inverting and retaining glycosyltransferases? Luke L. Lairson,Warren W. WakarchukChem. Commun., 2007, 365-367 10.1039/B614636H
- [2] An asymmetric supercapacitor based on controllable WO3 nanorod bundle and alfalfa-derived porous carbon? Kanjun Sun,Fengting Hua,Shuzhen Cui,Yanrong Zhu,Hui Peng,Guofu MaRSC Adv., 2021,11, 37631-37642 10.1039/D1RA04788D
- [3] An antioxidant self-healing hydrogel for 3D cell cultures? Lei Yang,Yuan Zeng,Haibo Wu,Chunwu Zhou,Lei TaoJ. Mater. Chem. B, 2020,8, 1383-1388 10.1039/C9TB02792K
- [4] An analyte-triggered artificial peroxidase system based on dimanganese complex for a versatile enzyme assay? Suji Lee,Min Su HanChem. Commun., 2021,57, 9450-9453 10.1039/D1CC03638F
- [5] Alt-proteins: A promising future 10.1002/fsat.3701_10.x
- [6] Aggregation behaviour of biocompatible choline carboxylate ionic liquids and their interactions with biomolecules through experimental and theoretical investigations? Somenath Panda,Kaushik Kundu,Anusha Basaiahgari,Sanjib Senapati,Ramesh L. GardasNew J. Chem., 2018,42, 7105-7118 10.1039/C8NJ00336J
- [7] An air-stable organometallic polymer containing titanafluorene moieties obtained by the Sonogashira–Hagihara cross-coupling polycondensation? Alvin Tanudjaja,Shinsuke Inagi,Fusao Kitamura,Toshikazu Takata,Ikuyoshi TomitaDalton Trans., 2021,50, 3037-3043 10.1039/D0DT03663C
- [8] An integrated droplet-digital microfluidic system for on-demand droplet creation, mixing, incubation, and sorting? Lab Chip, 2019,19, 524-535 10.1039/C8LC01170B
- [9] Acenaphthenic hopanoids, a novel series of aromatised Teresita Carrillo-Hernández,Philippe Schaeffer,Pierre AlbrechtChem. Commun., 2001, 1976-1977 10.1039/B105198A
- [10] An approach to C–N activation: coupling of arenesulfonyl hydrazides and arenesulfonyl chlorides with tert-amines via a metal-, oxidant- and halogen-free anodic oxidation?? M. Sheykhan,S. Khani,S. Shaabanzadeh,M. JoafshanGreen Chem., 2017,19, 5940-5948 10.1039/C7GC03141F
Journal Name:Digital Discovery
research_products
-
CAS no.: 89640-58-4