Ensemble Learning Approach with LASSO for Predicting Catalytic Reaction RatesThis work is partly based on results obtained from a project, JPNP16010, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
The prediction of the initial reaction rate in the tungsten-catalyzed epoxidation of alkenes by using a machine learning approach is demonstrated. The ensemble learning framework used in this study consists of random sampling with replacement from the training dataset, the construction of several predictive models (weak learners), and the combination of their outputs. This approach enables us to obtain a reasonable prediction model that avoids the problem of overfitting, even when analyzing a small dataset.
Key wordsmachine learning - catalytic reactions - reaction rates - ensemble learning - small datasets
Received: 31 July 2020
Accepted after revision: 05 November 2020
05 November 2020 (online)
© 2020. Thieme. All rights reserved
Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany
References and Notes
- 1 Sato K, Aoki M, Ogawa M, Hashimoto T, Panyella D, Noyori R. Bull. Chem. Soc. Jpn. 1997; 70: 905
- 2a Toyao T, Maeno Z, Takakusagi S, Kamachi T, Takigawa I, Shimizu K.-I. ACS Catal. 2020; 10: 2260
- 2b Yang W, Fidelis TT, Sun W.-H. ACS Omega 2020; 5: 83
- 3a Ahneman DT, Estrada JG, Lin S, Dreher SD, Doyle AG. Science 2018; 360: 186
- 3b Estrada JG, Ahneman DT, Sheridan RP, Dreher SD, Doyle AG. Science 2018; 362: eaat8763
- 3c Construction of the predictive model using Doyle’s catalytic reaction data, see: Sandfort F, Strieth-Kalthoff F, Kühnemund M, Beecks C, Glorius F. Chemistry 2020; 6: 1379
- 4 Granda J, Donina L, Dragone V, Long D.-L, Cronin L. Nature 2018; 559: 377
- 5 Perera D, Tucker JW, Brahmbhatt S, Helal CJ, Chong A, Farrell W, Richardson P, Sach NW. Nature 2018; 559: 377
- 6 Yada A, Nagata K, Ando Y, Matsumura T, Ichinoseki S, Sato K. Chem. Lett. 2018; 47: 284
- 7a Burello E, Farrusseng D, Rothenberg G. Adv. Synth. Catal. 2004; 346: 1844
- 7b Landman IR, Paulson ER, Rheingold AL, Grotjahn DB, Rothenberg G. Catal. Sci. Technol. 2017; 7: 4842
- 8 Multivariate statistical modeling adapted to olefin metathesis reaction for the understanding the relationship between initial TOF (TOFin) and structural feature of ligand, see: Ferreira MA. B, Silva JD. J, Grosslight S, Fedorov A, Sigman MS, Copéret C. J. Am. Chem. Soc. 2019; 141: 10788
- 9 Zhou Z.-H. Ensemble Methods: Foundations and Algorithms. CRC Press; Boca Raton: 2012
- 10 Tibshirani R. J. R. Statist. Soc. B 1996; 58: 267
- 11 19-reaction data and descriptors of catalysts are available in ‘Summary_of_parameters_and_output.csv ’ as Supporting Information. Detailed explanation is in the Supporting Information. Our R code for prediction is unfortunately not available because of the confidential nature of this national project.
- 12 All the RMSEtest and RMSEtrn for 3876 partitioned data are available in ‘Summary_of_RMSE_for_3876_patterns.csv’ as Supporting Information.
- 13 Bach, F. R.; Bolasso: model consistent lasso estimation through the bootstrap. In ICML ’08: Proceedings of the 25th international conference on Machine learning, pages 33–40, New York, NY, USA, 2008b. ACM. ISBN 978-1-60558-205-4 http://doi.acm.org/10.1145/1390156.1390161.
- 14 Gunam Resul MF. M, López Fernández AM, Rehman A, Harvey AP. React. Chem. Engl. 2018; 3: 747
Machine learning prediction of turnover frequency (TOF) in catalytic reactions, see: