Data augmentation and transfer learning strategies for reaction prediction in low chemical data regimes?
Organic Chemistry Frontiers Pub Date: 2021-02-03 DOI: 10.1039/D0QO01636E
Abstract
Effective and rapid deep learning method to predict chemical reactions contributes to the research and development of organic chemistry and drug discovery. Despite the outstanding capability of deep learning in retrosynthesis and forward synthesis, predictions based on small chemical datasets generally result in a low accuracy due to an insufficiency of reaction examples. Here, we introduce a new state-of-the-art method, which integrates transfer learning with the transformer model to predict the outcomes of the Baeyer–Villiger reaction which is a representative small dataset reaction. The results demonstrate that introducing a transfer learning strategy markedly improves the top-1 accuracy of the transformer-transfer learning model (81.8%) over that of the transformer-baseline model (58.4%). Moreover, we further introduce data augmentation to the input reaction SMILES, which allows for a better performance and improves the accuracy of the transformer-transfer learning model (86.7%). In summary, both transfer learning and data augmentation methods significantly improve the predictive performance of transformer models, which are powerful methods used in the field of chemistry to eliminate the restriction of limited training data.
Recommended Literature
- [1] Emerging investigator series: heterogeneous reactions of sulfur dioxide on mineral dust nanoparticles: from single component to mixed components? Tao Wang,Yangyang Liu,Yue Deng,Hongbo Fu,Jianmin ChenEnviron. Sci.: Nano, 2018,5, 1821-1833 10.1039/C8EN00376A
- [2] Evolution and characterization of a benzylguanine-binding RNA aptamer? J. Xu,T. J. Carrocci,A. A. HoskinsChem. Commun., 2016,52, 549-552 10.1039/C5CC07605F
- [3] Evolution of important glucosinolates in three common Brassica vegetables during their processing into vegetable powder and in vitro gastric digestion Nan Fu,Naphaporn Chiewchan,Xiao Dong ChenFood Funct., 2020,11, 211-220 10.1039/C9FO00811J
- [4] Fe(ii)-Assisted one-pot synthesis of ultra-small core–shell Au–Pt nanoparticles as superior catalysts towards the HER and ORR? Yi Cao,Yujiao Xiahou,Lixiang Xing,Xiang Zhang,Hong Li,ChenShou Wu,Haibing XiaNanoscale, 2020,12, 20456-20466 10.1039/D0NR04995F
- [5] Fate of Sb(v) and Sb(iii) species along a gradient of pH and oxygen concentration in the Carnoulès mine waters (Southern France) Eléonore Resongles,Corinne Casiot,Fran?oise Elbaz-Poulichet,Rémi Freydier,Odile Bruneel,Christine Piot,Sophie Delpoux,Aurélie Volant,Angélique DesoeuvreEnviron. Sci.: Processes Impacts, 2013,15, 1536-1544 10.1039/C3EM00215B
- [6] Examination of ammonia–poly(pyrrole) interactions by piezoelectric and conductivity measurements Analyst, 1991,116, 1125-1130 10.1039/AN9911601125
- [7] Evidence for the intrinsic nature of band-gap states electrochemically observed on atomically flat TiO2(110) surfaces? Shintaro Takata,Yoshihiro MiuraPhys. Chem. Chem. Phys., 2014,16, 24784-24789 10.1039/C4CP03280B
- [8] Dissolved oxygen sensor based on fluorescence quenching of oxygen-sensitive ruthenium complexes immobilized in sol–gel-derived porous silica coatings Analyst, 1996,121, 785-788 10.1039/AN9962100785
- [9] Examination of deposit in commercial diluted phosphoric acid Analyst, 1880,5, 146-147 10.1039/AN8800500146
- [10] Dissociative electron attachment to HGaF4 Lewis–Br?nsted superacid Marcin Czapla,Jack SimonsPhys. Chem. Chem. Phys., 2018,20, 21739-21745 10.1039/C8CP04007A
Journal Name:Organic Chemistry Frontiers
research_products
-
CAS no.: 89640-58-4