Synlett
DOI: 10.1055/a-1304-4878
cluster
Machine Learning and Artificial Intelligence in Chemical Synthesis and Catalysis

Ensemble Learning Approach with LASSO for Predicting Catalytic Reaction Rates

Akira Yada
a  Interdisciplinary Research Center for Catalytic Chemistry, National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Higashi, Tsukuba, Ibaraki 305-8565, Japan
,
b  Research Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan
,
b  Research Center for Computational Design of Advanced Functional Materials, National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan
,
c  Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science (NIMS), 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
,
Sakina Ichinoseki
a  Interdisciplinary Research Center for Catalytic Chemistry, National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Higashi, Tsukuba, Ibaraki 305-8565, Japan
,
Kazuhiko Sato
a  Interdisciplinary Research Center for Catalytic Chemistry, National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Higashi, Tsukuba, Ibaraki 305-8565, Japan
› Author Affiliations
This work is partly based on results obtained from a project, JPNP16010, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).


Abstract

The prediction of the initial reaction rate in the tungsten-catalyzed epoxidation of alkenes by using a machine learning approach is demonstrated. The ensemble learning framework used in this study consists of random sampling with replacement from the training dataset, the construction of several predictive models (weak learners), and the combination of their outputs. This approach enables us to obtain a reasonable prediction model that avoids the problem of overfitting, even when analyzing a small dataset.

Supporting Information



Publication History

Received: 31 July 2020

Accepted after revision: 05 November 2020

Publication Date:
05 November 2020 (online)

© 2020. Thieme. All rights reserved

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References and Notes

  • 1 Sato K, Aoki M, Ogawa M, Hashimoto T, Panyella D, Noyori R. Bull. Chem. Soc. Jpn. 1997; 70: 905
    • 2a Toyao T, Maeno Z, Takakusagi S, Kamachi T, Takigawa I, Shimizu K.-I. ACS Catal. 2020; 10: 2260
    • 2b Yang W, Fidelis TT, Sun W.-H. ACS Omega 2020; 5: 83
    • 3a Ahneman DT, Estrada JG, Lin S, Dreher SD, Doyle AG. Science 2018; 360: 186
    • 3b Estrada JG, Ahneman DT, Sheridan RP, Dreher SD, Doyle AG. Science 2018; 362: eaat8763
    • 3c Construction of the predictive model using Doyle’s catalytic reaction data, see: Sandfort F, Strieth-Kalthoff F, Kühnemund M, Beecks C, Glorius F. Chemistry 2020; 6: 1379
  • 4 Granda J, Donina L, Dragone V, Long D.-L, Cronin L. Nature 2018; 559: 377
  • 5 Perera D, Tucker JW, Brahmbhatt S, Helal CJ, Chong A, Farrell W, Richardson P, Sach NW. Nature 2018; 559: 377
  • 6 Yada A, Nagata K, Ando Y, Matsumura T, Ichinoseki S, Sato K. Chem. Lett. 2018; 47: 284

    • Machine learning prediction of turnover frequency (TOF) in catalytic reactions, see:
    • 7a Burello E, Farrusseng D, Rothenberg G. Adv. Synth. Catal. 2004; 346: 1844
    • 7b Landman IR, Paulson ER, Rheingold AL, Grotjahn DB, Rothenberg G. Catal. Sci. Technol. 2017; 7: 4842
  • 8 Multivariate statistical modeling adapted to olefin metathesis reaction for the understanding the relationship between initial TOF (TOFin) and structural feature of ligand, see: Ferreira MA. B, Silva JD. J, Grosslight S, Fedorov A, Sigman MS, Copéret C. J. Am. Chem. Soc. 2019; 141: 10788
  • 9 Zhou Z.-H. Ensemble Methods: Foundations and Algorithms. CRC Press; Boca Raton: 2012
  • 10 Tibshirani R. J. R. Statist. Soc. B 1996; 58: 267
  • 11 19-reaction data and descriptors of catalysts are available in ‘Summary_of_parameters_and_output.csv ’ as Supporting Information. Detailed explanation is in the Supporting Information. Our R code for prediction is unfortunately not available because of the confidential nature of this national project.
  • 12 All the RMSEtest and RMSEtrn for 3876 partitioned data are available in ‘Summary_of_RMSE_for_3876_patterns.csv’ as Supporting Information.
  • 13 Bach, F. R.; Bolasso: model consistent lasso estimation through the bootstrap. In ICML ’08: Proceedings of the 25th international conference on Machine learning, pages 33–40, New York, NY, USA, 2008b. ACM. ISBN 978-1-60558-205-4 http://doi.acm.org/10.1145/1390156.1390161.
  • 14 Gunam Resul MF. M, López Fernández AM, Rehman A, Harvey AP. React. Chem. Engl. 2018; 3: 747