Planta Med 2013; 79 - SL26
DOI: 10.1055/s-0033-1351852

Temporal characteristics of a natural products in-house database

TB Oliveira 1, DA Chagas-Paula 1, AL Rosa 1, L Gobbo-Neto 1, TJ Schmidt 2, FB Da Costa 1
  • 1University of São Paulo, School of Pharmaceutical Sciences of Ribeirão Preto, Ribeirão Preto-SP, Brazil
  • 2Westfälische Wilhelms-Universität Münster, Institut für Pharmazeutische Biologie und Phytochemie, Münster, Germany

In the era of Big Data it is necessary to transform data into knowledge so that information has high value gain. In this context, our research group AsterBioChem wishes to contribute for this Big Data in the field of natural products. A newly developed in-house database of plants from South American Asteraceae called AsterDB (AsterBioChem Database) comprises diverse chemical structures and additional information of compounds isolated by AsterBioChem members. The AsterDB has taxonomic information, biological activities and structural information. This database can be used for chemosystematic studies, extract dereplication, QSAR and QSRR studies. In this work we describe a QSRR study involving sesquiterpene lactones from AsterDB. The chromatographic information comprises logarithm of retention factor obtained from experimental retention times using reversed phase analytical C-18 column in isocratic elution (MeOH-H2O 1:1 and 3:7 MeCN-H2O). The 2D and 3D descriptors were calculated using softwares PaDEL, Adriana. Code, Dragon and MOE. Pre-processing of the descriptors was carried out with the Caret's package from R. Four procedures for descriptor selection were used: genetic algorithm (GA), forward selection, best first and greedy stepwise. Artificial neural networks (ANN) with the backpropagation algorithm and partial least squares (PLS) were used as modeling tools. More than 300 models were built using different combinations of training and test sets. The best models were selected and those with overfitting and external validation with Q2ext< 0.7 were discarded. The best model for MeCN was obtained by PLS/GA using 2D descriptors (R2= 0.96, Q2= 0.92 and Q2ext = 0.91) and that for MeOH was obtained by ANN/GA using 3D descriptors (R2= 0.91, Q2= 0.88 and Q2ext = 0.80). This QSRR modeling showed that AsterDB is able to transform data into knowledge for future use in dereplication of secondary metabolites from plant extracts.

Acknowledgements: FAPESP, CAPES, CNPq