Planta Med 2014; 80(14): 1161-1170
DOI: 10.1055/s-0033-1360109
Reviews
Georg Thieme Verlag KG Stuttgart · New York

Natural Product Libraries: Assembly, Maintenance, and Screening.

Mark S. Butler
Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, The University of Queensland, St. Lucia, Brisbane, Australia
,
Frank Fontaine
Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, The University of Queensland, St. Lucia, Brisbane, Australia
,
Matthew A. Cooper
Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, The University of Queensland, St. Lucia, Brisbane, Australia
› Author Affiliations
Further Information

Correspondence

Dr. Mark S. Butler
Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, The University of Queensland
St. Lucia, Brisbane, 4072
Australia
Phone: +61 7 33 46 29 92   
Fax: ++61 7 33 46 20 90   

Publication History

received 29 August 2013
revised 22 October 2013

accepted 30 October 2013

Publication Date:
05 December 2013 (online)

 

Abstract

This review discusses successful strategies and potential pitfalls to assembling a natural product-based library suitable for high-throughput screening. Specific extraction methods for plants, microorganisms, and marine invertebrates are detailed, along with methods for generating a fractionated sub-library. The best methods to store, maintain and prepare the library for screening are addressed, as well as recommendations on how to develop a robust high-throughput assay. Finally, the logistics of moving from an assay hit to pure bioactive compound are discussed.


Introduction

Natural products often possess biological activity and are a valuable source of drug leads [1], [2], [3], [4]. Many natural products occupy unique areas of chemical space [5] and they often differ from synthetic compounds by the presence of multiple chiral centers, an abundance of O over N atoms, and more H-bond donors and acceptors [6], [7], [8], [9]. Despite these structural differences and rich history in drug development, natural product-based hit discovery is currently almost nonexistent in large pharmaceutical companies. This is mainly due to the perception that (i) the time between isolation and structure elucidation is too long, (ii) there are diminishing returns due to high dereplication rates, (iii) there is a resupply issue for hit-to-lead and preclinical studies, and (iv) natural products are not “drug-like” and make poor leads. These perceptions can be countered as follows. Advances in dereplication, isolation techniques, and structure elucidation have reduced the time from extract to pure compound to under one month on average in well equipped, organized, and experienced labs [1], [10], [11]. The resupply issue can be resolved by focusing on microorganisms, with other organism types such as marine invertebrates and plants only screened if there is well documented collection information and a resupply plan. Advances in synthesis have also solved some supply issues as demonstrated by the totally synthetic anticancer drug eribulin mesylate (trade name: Halaven®; launched 2010) that was derived from the complex, sponge-derived lead halichondrin B ([Fig. 1]) [12], [13]. Natural product anticancer drugs such as trabectedin (trade name: Yondelis®; launched in 2007) [14], [15] and homoharringtonine (trade name: Synribo®; launched 2012) [16] are produced semisynthetically for clinical use to overcome resupply issues ([Fig. 1]).

Zoom
Fig. 1 Structures of the complex, sponge-derived anticancer lead halichondrin B and its synthetically inspired drug eribulin produced by total synthesis; ester hydrolysis of the naturally occurring ester mixture to give cephalotaxine, which is then esterified to give the anticancer drug homoharringtonine (omacetaxine mepesuccinate); the ascidian-derived anticancer drug trabectedin is semisynthetically produced from the bacterial-derived cyanosafarin B.

The concern that natural products are “drug-like” is paradoxical given that natural products have been lead structures for many drugs [2], [4]. Christopher Lipinski, who introduced the “rule-of-five” for guiding drug-like properties required for oral adsorption, suggested that these rules were not suitable for natural products due to the potential for active transport [17]. The disconnect with the rule-of-five and some natural products is demonstrated in Ganesanʼs analysis of the 24 natural product drug leads discovered from 1970–2006 that led to a drug approval [18]. In this study, it was found that half of these leads were inside the “Lipinski universe” (0 or 1 rule violations) chemical space, while the other half were in a natural product “parallel universe”. Revealingly, leads from the “parallel universe” had an equivalent chance to leads from the “Lipinski universe” of producing an oral drug; these leads were Lipinski-compliant in terms of clogP and H-bond donors, as well as potentially being able to access carrier mediated or active transport mechanisms [18].

In the 1980s, advances in robotics and computing, coupled with enhanced access to biological reagents and proteins, enabled large pharmaceutical companies to develop high-throughput screening (HTS) capabilities [19], [20], [21]. Initially HTS was used to screen modest sized synthetic compound libraries (thousands of compounds) and natural product extract libraries, but the increased screening capacity created soon led to an explosion in the number of synthetic compounds added to screening libraries using combinatorial chemistry (hundreds of thousands of compounds). During this time, HTS has had an impact on drug discovery leading to the identification of numerous new drugs leads and at least 12 drugs on the market [22].

For over 20 years, HTS facilities have also been used in academic settings and smaller biotechnology companies to screen for new drugs leads. As large pharmaceutical companies retreat further from natural product-based lead discovery (and basic research), there are now opportunities for university-based research groups to be at the forefront of lead discovery in natural products. It is also now feasible for these groups to use in-house resources and contract research organizations (CROs) to validate these drug leads and undertake preclinical drug development [23].

In this review, we discuss how to assemble a natural product-focused library that would consist of structurally diverse extracts, prefractionated extracts, and pure compounds. Practical methods to maintain, store and efficiently screen these natural product-derived samples are also discussed, as the logistics of moving from an assay hit to pure bioactive compounds.


Initiating a Natural Product Library

Extraction methods

The early years of natural product chemistry research were dominated by large scale plant-based studies that often used harsh extraction methods. These extraction techniques included the use of boiling solvents, acid-base extraction, and steam distillation. As isolation techniques and technology have improved, extraction conditions have become milder, and the amount of compound required for structure elucidation has dropped to well below 1 mg. Extraction methods [24], [25] developed to increase extraction efficiency include the use of ultrasound [26], [27], microwave [27], [28], pressure (accelerated solvent extraction) [29], supercritical fluid [30], [31], ionic liquids [32], and deep eutectic solvents [33], [34]. Although these newer methods can be useful, a simple method such as extraction with slightly aqueous MeOH using gentle agitation captures most of the drug-like molecules and is usually more than adequate for biological screening purposes. Another popular extraction solvent for screening libraries is EtOAc, which extracts less polar material compared to MeOH, resulting in cleaner extracts of mid-polarity compounds. Extraction conditions for the compounds of interest can be subsequently optimized upon re-extraction or scale-up studies.


Sources of natural products

Plants: Plants have been used as the basis of medicines for thousands of years and even today these traditional medicines are relied upon for health care in many parts of the world [35], [36]. Plant secondary metabolites are often stored inside cells, which need to be ruptured to increase the extractable yield. The plant material is usually dried, either at slightly elevated temperatures or using freeze drying, and ground to a fine powder before extraction. The choice of solvent depends upon the desired polarity range with MeOH or EtOAc being excellent options for a single solvent-derived extract library as previously discussed. Extracts can also be generated by successively extracting with nonpolar to polar solvents such as heptane, CH2Cl2, MeOH, and H2O, while hot H2O can be used to mimic the administration of many traditional medicines as infused teas. Before screening, plant extracts can be passed through a polyamide column (eluent MeOH) or polyvinylpyrrolidone (PVP) to remove polyphenols that can interfere with various enzyme biological assays [37], [38], [39]. Alkaloids [40] can also be enriched from plant samples using an acid/base extraction protocol and detected using Dragendorffʼs reagent and/or using (+)-electrospray (ESI)-MS [37], [41].

Microorganisms: Microbes have been the mainstay of industrially focused natural product research due to their propensity to produce novel molecules and the use of fermentation as a renewable source of compound resupply. Fungi and bacteria can be cultivated on both solid and liquid media under a variety of conditions; however, bacteria are predominantly grown in liquid media, while fungi are grown using both solid and liquid media. Important parameters for secondary metabolite production are the choice of media, temperature, aeration, and duration of the fermentation. On a small scale, liquid cultivations can be performed in test tubes, flasks, or specially designed microtiter plates [42], [43], [44], which can be freeze-dried and extracted with an appropriate solvent or directly extracted with an organic solvent that is immiscible with the broth [e.g., EtOAc, n-butanol (n-BuOH), methyl ethyl ketone (MEK), tert-butyl methyl ether (TBME)]. Centrifugal evaporation is particularly useful for organisms grown in high salt content media to minimize “bumping” that can occur during freeze drying. For larger scale liquid cultivations, multiple flasks or a bioreactor are used with the biomass usually separated from the broth using centrifugation. The wet biomass can then be extracted directly, or freeze dried and then extracted. Broth metabolites can be concentrated by elution through a resin column with increasing amounts of organic solvents (usually MeCN or MeOH) in H2O. Commonly used resins are the brominated-polystyrene Sepabeads SP207 (Mitsubishi Chemical Corp), the cross-linked polystyrene Diaion HP20 (Mitsubishi Chemical Corp), and the XAD (The Dow Chemical Company) ion exchange resins. These resins can also be added during cultivation to capture compounds, often in increased yield, and to help with downstream purification [45]. There have also been reports of larger scale solid fermentations [46], [47].

Altering media and cultivation conditions can considerably influence secondary metabolite profile production. This is exemplified by the OSMAC (one strain many compounds) approach [48], which is particularly compatible with the number of cultivations available using the microtiter plate format, and the use of additives to alter secondary metabolite profiles [49], [50], [51] or activate cryptic biosynthesis pathways [50], [52], [53].

Marine invertebrates: As a result of potential logistical difficulties associated with marine invertebrate recollection, gentle extraction methods are generally used to minimize compound degradation. A commonly used method is to cut the invertebrate in smaller pieces, place in a plastic container, immerse in EtOH, and refrigerate until required. EtOH is often used instead of MeOH to try and distinguish between naturally occurring esters and artifactual esterification. The Crews Group have reported a method that they use in the field where specimens are immersed in an MeOH : H2O (1 : 1) solution and after approximately 24 h the liquid is decanted and discarded [54]. The specimen is placed in a Nalgene container, shipped back to the laboratory at ambient temperature and then stored at 4 °C until further processed [54]. Marine invertebrate specimens can also be freeze dried, powdered and extracted, but care needs to be taken as occasionally some metabolites can sublime into the cold trap.


Prefractionation and purified NP libraries

One way to reduce the complexity of an extract is to fractionate before biological testing using column-based [55], [56], [57], [58], [59], [60], [61], [62], [63] or liquid-liquid separation methods [64], [65], [66]. The production and the guiding principles behind the generation of prefractionated libraries have been recently reviewed [67], [68]. Prefractionated extracts are considerably less complex than crude extracts, which enables screening at a high concentration and simplifies dereplication. Although enriched extracts allow the detection of minor components during screening, it is not possible to remove all interfering or frequent hitting compounds that will also be enriched. Prefractionated extracts also facilitate the unmasking of active compounds from cytotoxic compounds and agonists from antagonists.

The ultimate prefractionated library is a pure natural product library that has a structurally diverse set of natural products with known structures and physicochemical properties and > 95 % purity. Pure natural product libraries are assembled in an iterative manner from in-house isolated compounds and natural products purchased from commercial sources. The largest commercial player in this area is AnalytiCon Discovery GmbH (Potsdam, Germany; http://www.ac-discovery.com/). AnalytiCon have assembled a 25 000 pure natural product library over the last 20 years based on about 2000 different chemotypes that are available in a ready-for-screening format [69], [70]. The only other description [71] of a pure natural product library was a plant-based library from MolecularNature Ltd., which is now available via PhytoQuest Ltd. (Aberystwyth, United Kingdom; http://www.phytoquest.co.uk/). Screening of this type of library is especially attractive for groups not interested in bioassay-guided isolation that still want to access natural product diversity.

A few words of caution: although it may seem attractive to generate large numbers of pure natural products and prefractionated extracts for screening, there is considerable effort and monetary investment required to generate, store and undertake quality control analyses [1]. There is also an increase in screening costs for each extra screening point and, as a consequence, the optimal number of fractions needs to be carefully considered. There also can be a loss in biodiversity coverage when screening prefractions, especially if the screening capacity is limited. To achieve optimal metabolite coverage in the minimum number of screening points, organism taxonomy studies along with robust assessments of the extract quality and chemical diversity must be undertaken before generating fractions [72], [73], [74], [75].

Before assembling the library, a decision is required about whether samples will be tested at equivalent µg/mL concentrations, which requires the weighing of each extract or prefraction, or at equivalent doses relative to a broth volume or amount of material extracted. Both methods have their advantages and disadvantages.



Maintaining and Storing a Natural Product Library

Organism storage

Dried plant material is usually stored at room temperature as a finely ground powder to minimize storage space and enhance extraction. Plant samples, especially intact plant parts, need to be periodically inspected for microorganism growth to identify contamination. Microorganism stock cultures are usually stored frozen (liquid nitrogen and − 80 °C freezer) [76], [77] as glycerol stocks or freeze-dried [78]. It is prudent to duplicate the microorganism collection in two separate locations in case of a catastrophic event that would destroy the freezer contents. It is also prudent to keep multiple stock copies of each microorganism to give the best chance of revival. Marine invertebrate samples can be stored in a − 4 °C freezer with or without the extraction solvent.


Extract and pure natural product storage and stability

Extracts and pure natural products are usually best stored dried under a dry, inert atmosphere at − 20 °C or below; however, these solid materials are not easily manipulated for screening purposes. One way around this problem is to aliquot the solid material at the desired screening concentrations, remove the solvent and store ready for screening. This dry plating method also has its disadvantages: the screening concentration is fixed, there could be issues with extract dissolution and some assays require the testing material to be added after the reagents and media. An alternative method is to dissolve the extracts and compounds in dry DMSO, which is a hygroscopic liquid that freezes at 18.5 °C, and dispense aliquots of this solution at the desired concentration. The dispensing is best performed using automated robots but can also be performed manually on a smaller scale. It has been shown that storing the DMSO stock solution as a solid can cause issues with compound stability and solubility that worsens with each freeze-thaw cycle [79], [80], [81]. A stability study of synthetic compound DMSO stocks stored at room temperature showed that a concentration of 8 % of the compounds has significantly decreased after 3 months, 17 % after 6 months, and 48 % after 1 year [82]. Interestingly, a recent paper has suggested that compound purity is the most important factor for stability of synthetic compounds [81]. On face value this implies that natural product extracts could be especially unstable but in reality the presence of other compounds increases compound solubility and stability inside the extracts. The presence of antioxidants also helps with compound stability in extracts.

It is almost impossible for the average laboratory to totally exclude H2O from DMSO solutions during the storage and dispensing phases. In this situation, DMSO stocks should be dispensed and stored for only a limited amount of time (say < 2 months) and the number of freeze-thaw cycles kept to a minimum. An example of a sophisticated compound and extract storage and dispensing facility is the Queensland Compound Library (http://www.griffith.edu.au/science-aviation/queensland-compound-library), which is housed at Griffith University (Brisbane, Australia). In this facility, compounds and extracts have a unique 2D barcode, which is used to track samples through the whole process from the automated weighing station to the plate dispensing for screening. The solid samples are stored in the dark under an inert and dry atmosphere at − 20 °C [83]. The screening samples are dissolved in dry DMSO and stored in liquid form under a dry N2 atmosphere for up to 5–6 years. Individual or sets of these screening samples are then assembled using the 2D barcode and transferred using an acoustic dispenser [84] in any format required for screening (e.g., 96-, 384-, or 1356-well microtiter plates). Periodic analyses of the screening samples using LC-MS are undertaken for quality control purposes.

Although state-of-the-art storage, dispensing, and analysis facilities increase the probability of identifying bioactive molecules, inexpensive and simple procedures like minimizing water in the DMSO stocks, limiting freeze-thaw cycles, and not using “expired DMSO stocks” go a long way to improving assay outcomes without significant monetary outlays.



Practicalities of Screening Natural Product Libraries

A natural product extract can contain hundreds to thousands of compounds that have a wide molecular weight, polarity, and abundance ranges. Before bioassay-guided isolation can be used to identify the compounds responsible for the activity, medicinally relevant and robust assays need to be developed, which can be divided into in vitro biochemical and cell-based assays. The biochemical assays are grouped into two subtypes based on their complexity and troubleshooting difficulty: binary assays or direct binding assays involving two partners (ligand and an analyte or binding partner) or ternary assays (ligand, analyte, and a ligand-analyte modulator). The use of more complex, more biologically relevant cell-based assays has steadily increased over the past several years [85], and they can be further divided into reporter, morphometric, and homogenous assays. Reporter assays use fluorescent and/or luminescent labels to monitor changes in protein expression or protein-protein clustering or interaction, while morphometric assays involve the measurement of changes in growth, cytotoxicity, and subcellular morphologic features using quantitative microscopy and label-free techniques. Finally, homogenous assays include a lysis step before readout and could almost be considered as biochemical assays. Electrophysiology and flow cytometry assays stand on their own as they involve a cell sorting step and a cell-by-cell readout.

Developing representative screening sets to interrogate screen robustness

The concept of the natural product matrix effect: An extract is a complex genus and media specific mixture or matrix that can interfere with the assay dynamic range, signal-to-noise ratio, and reproducibility. Although a solvent-vehicle can be used as a baseline control, an inactive extract representative of the libraryʼs “average” matrix can also be used. It is important that the inactive extract matrix be representative of the average matrices of the other extracts. For example, the “generic inactive extract” for a marine invertebrate n-BuOH extract library can be produced by pooling whole EtOH extracts from three to five randomly selected library specimens, drying the extracts and partitioning between H2O and n-BuOH. The n-BuOH layer is dried, weighed and adjusted to the desired w/v concentration required for the stock solution. This solution is left on the laboratory bench for one week under ambient laboratory fluorescent light to generate the final “generic inactive extract”.

Positive and negative controls: This “generic inactive extract” is then used as the negative control or baseline for the screening campaign. If an active small molecule has been previously identified, positive controls and/or standard curves should be generated using the same “generic inactive extract” spiked with the known active compound. Simple positive controls in the solvent should also be run in parallel to establish the “matrix effect” on the assay dynamic range, signal to noise ratio, and reproducibility.

For some biological targets, no active molecules have been identified. For homogenous assays that use recombinantly expressed purified proteins or membrane preparations, chaotropic agents such as guanidine and SDS can be used as positive controls due their denaturing effect. Care needs to be taken as these types of positive controls can be misleading when trying to establish assay tolerability to the vehicle-solvent concentration, as they can trigger the expected assay response, even at a high vehicle-solvent concentration, while displaying the hallmarks of a working assay (e.g., stable baseline, no precipitation, and a large dynamic range). However, the assay may no longer be able to detect an active compound using these conditions. Residual solvent in the samples such as n-BuOH can also act itself as a chaotropic agent, disrupting the structure of, or worse, denaturing proteins [86]. A rule-of-thumb is to assume that there is no tolerance to the solvent in these assays until an active compound has been identified suitable for use as a positive control.

The solvent effect can be avoided if the vehicle-solvent used to transfer a library in the assay microtiter plate can be evaporated and the extract resuspended in assay buffer. In most cases, the resuspension of extracts/prefractions is straightforward except for a small number of extracts that can emulsify during the brisk agitation in the assay buffer. The effect of this transfer, drying, and resuspension cycle should also be evaluated on the positive control (generic inactive extract spiked with active compound).

Assay optimization and pilot study: The experimental conditions of each assay need to be optimized in regards to parameters such as reagent and extract concentrations, reagent addition times, endpoint window to account for different instrument reading times, replicate reproducibility, and DMSO tolerance. Once these parameters have been established, a pilot study is undertaken on a subset of representative extracts and prefractions, and the pure natural product library. The data from the representative extracts is then analyzed for assay robustness and from these results, promising hits are moved to the dereplication stage. These data can also be extrapolated to give an estimated overall assay hit rate. The pure natural product library is used to identify compounds of interest and frequent hitters that are flagged for dereplication.

For an example of this process, consider the marine invertebrate n-BuOH extract library, which has already been formatted in a screening-ready format in deep 96-well plates. The extracts were formatted at several w/v concentrations (25, 2.5, and 0.25 mg/mL), relative to the initial concentration of the whole extract before n-BuOH/H2O partition. These concentrations offer flexibility and accuracy when plating sub-mg to mg amounts in assay wells, where incubation volumes range from 10 to 250 µL.

Cytotoxicity is not an issue for cell-based assay protocols that use short incubation times (30–45 minutes) or homogenous assays, and extracts in the several hundreds of µg/mL can be used in these assays. For cell-based assays that require longer incubation times, usually only concentrations less than 10 µg/mL can be used to avoid observing toxicity. Cell toxicity should be tested with a large concentration range of “generic inactive extract”, e.g., 0 to 500 µg/mL, using alamarBlue, cytosolic lactose dehydrogenase release, or similar readouts, in order to estimate the EC50 profile of the library genus matrix. The incubation duration depends on the assay type and can vary from 30 minutes for ion channel cell-based assays [87], [88] to 15–18 hours for a lipid droplet formation assay [89], [90], or up to four days for a stem cell commitment assay that was developed to identify molecules stimulating commitment of neural precursor cells to neuronal lineage [91], [92]. Care needs to be taken when using DMSO-containing libraries as most cell-based assays can only tolerate < 0.5 % DMSO.

The next step is to decide whether the natural product library will be screened at a single concentration or at multiple concentrations in singlicate or duplicate experiments. The two extremes are libraries screened at a single concentration in singlicate and libraries screened using quantitative HTS (qHTS), in which samples are screened at multiple concentrations to generate concentration-response curves [93], [94]. The limited supply of some natural product extracts is also a consideration when deciding on screening numbers. We would recommend screening at two concentrations when practicable, usually 10- to 100-fold apart ([Fig. 2]). Besides providing concentration-dependence information, these data are useful for ranking hits in order of potency, as well as identifying extracts that are only active at the lower concentration. Extracts not active at the higher concentration could have their activity masked by a matrix effect or cell cytotoxicity or be false positives. Examples of screenings of a natural product library at low to very low concentrations are starting to appear in the literature [95]. We recommend that this strategy be used for cell-based assays, but it can also be used for biochemical assays.

Zoom
Fig. 2 Screening at high concentrations (Zone 1) leads to higher hit rates (continuous line) that identify a predominance of compounds with low target affinity (dashed line). For example, screening a set of marine invertebrate extracts at 10 µg/mL w/v with compounds at a relative abundance of 1–10 % (average MW of 500) is equivalent to testing pure compounds in the 0.2 to 2 µM range, which often results in the identification of hits in the mid- to low µM affinity range. Although mid- to low nM actives can also be identified when screening in Zone 1, the extract matrix effect and compound interference can also mask the activity. Screening in Zone 1 for cell-based assays where cytotoxicity often leads to “bell-shaped” dose-response curves will miss relatively abundant mid- to low nM actives. Screening the extracts at 10- and 100-fold lower concentrations (Zone 2 and Zone 3) reduces the matrix and interference effects leading to identification of extracts that would have be missed or not prioritized when screening in Zone 1. As a consequence, we recommend screening natural product extracts and prefractions when practicable using at least two concentrations, especially for cell-based screening.

Matrix components that lead to assay interference

Colored or fluorescent compounds: These types of compounds can interfere with colorimetric and fluorescent assays and result in both false positives and negatives depending upon the assay readout. For example, compounds with molecular weights in the range of 300 to 600 Da can be yellow to red in color, which can interfere with the colorimetric readouts, or absorb and fluoresce in a yellow-green spectrum range similar to fluorescein, which is commonly used in fluorescence polarization-based binding competition assays [96]. As these compounds are often present in the assay in the µM range, which is around 1000-fold higher than the assay probe present in the nM range, most of the energy of excitation will be absorbed by the compounds and probe-derived fluorescence will decrease accordingly. In fluorescence polarization-based assays, fluorescence interference will decrease the total fluorescence (parallel and perpendicular) and increase the calculated fluorescent polarization ratios. Fluorescence interference in these types of assays can be decreased by using far-red fluorescent probes [96].

Micelle formation and aggregation: Some extracts, prefractions, and pure compounds can form micelles in assay buffer that can destabilize a purified system (recombinant protein, membrane preparation) in a biochemical assay. The micelles can cause toxicity in cell-based assays, while in homogeneous assays they can trap fluorescent probes, fluorescent chemosensors, or fluorescence-tagged reagents leading to a high local concentration that can interfere with the fluorescence readout (quenching) or prevent functional chemosensing. Micelles are usually formed by detergent-like molecules such as fatty acids and sulfated compounds, but this behavior can also be observed with other types of molecules. Simple assays are available to measure critical micelle concentration (CMC) and can easily be implemented in 96- or 384-well plate format [97]. Detergent-like molecules present can also form colloidal aggregations that cause assay interference [98], [99], [100], [101], [102], which can usually be identified through the addition of detergent to disrupt aggregation [98], [103]. Finally, there have also been reports of detergent-like molecules being released from the plasticware that can cause assay interference [104], [105].

Media-derived interference: The media used to grow microorganisms is often a soup of diverse ingredients that include salts, amino acids, proteins, and complex biological products. Some assays are sensitive to the different salts [106] such as FLIPR assays that detect Ca2+ flux [107], or assays involving metalloproteases, which contain cations in their active sites [108]. Other examples include the production of large amounts of aluminum dioxalate by some fungi grown using vermiculite-based solid media [109] and the bacterial biotransformation of soybean-derived glycosylated isoflavones to the biologically active aglycones genistein and daidzein.

Pan-active bioactive compounds: Broad-spectrum kinase and protease are pivotal modulators of cell protein-protein interactions and cell signaling. The abundance of these broad-spectrum cell-signaling modulators in some natural extracts may “paralyze” cells, without the usual hallmarks of cytotoxicity, and mask compounds of biological interest, which need a “functional cell” to modulate a phenotype, like cell differentiation, secretion, and trafficking. Examples include bacteria and marine invertebrate extracts that contain staurosporine-like kinase inhibitors [110], and marine algae and venom extracts that contain protease inhibitors [111], [112].


Moving from the assay to bioactive compounds

Assay quality control: Z-factor (or Z′ analyses) is a statistical method used to indicate whether an assay is suitable for HTS campaigns [113]. Z-factor is an important quality control that should be calculated for every completed assay microtiter plate. The value should remain relatively constant during screening, but if a variation is observed, then there could be issues with the screen performance. These variations can be caused by poor library quality, batch-to-batch cell and protein quality differences, incorrect assay optimization, or instrument issues. The Z-factor can vary from poor (Z < 0), which can occur if there is too much overlap between negative and positive controls, marginal (0 < Z < 0.5), and excellent (0.5 < Z < 1) assay quality. In many instances, assays cannot be optimized beyond the marginal level, but this is often sufficient for screening with extra care taken to analyze the data. For example, for fluorescence polarization assays, the signal-to-noise ratio is usually in the 1.5 to 4 range leading to Z-factors of around 0.5, but this assay format is precise, robust, and reproducible and, in our opinion, the technology of choice for homogeneous-type assays to screen natural products [114].

The hit rate determined during the pilot study of the various library subclasses can also be extrapolated to the larger screening sets. If the hit rates diverge considerably between the pilot study and the screening campaign, then it is worth further analysis of the hits as this could be a flag for poor assay performance and extract quality.

Prioritizing extracts/prefractions for evaluation and bioassay-guided isolation: During the screening campaign, it is recommended that carefully selected assay hits from the different screening libraries enter the dereplication [68], [70], [115], [116] stage to identify commonly occurring active compounds and to get a head start on identifying new bioactive molecules ([Fig. 3]). Once the screening campaign is completed, then the overall hit rate of the screen can be calculated. The hit rate is usually between 0 % to 5 % depending upon the assay setup and hit selection cutoff, but for some screens such as cell line cytotoxicity and gram-positive bacteria whole cell assays, the hit rate can be up to 15 % depending upon the library.

Zoom
Fig. 3 Schematic of a typical dereplication process. Up to 1 mg of crude extract or prefraction are separated using HPLC (C18 column) and the eluting compounds analyzed using a photodiode array detector (PDA). The eluent is then split with a majority flowing into a microtiter plate and the remainder analyzed using ESI-MS and MS/MS. The microtiter plate contents are dried using centrifugal evaporation and DMSO is added to each well ready for screening. It is prudent to add the crude extract or originating fraction to the plate at two concentrations for comparative purposes. The (A) retention time (RT), (B) UV spectra, (C) MS, and (D) MS/MS of compounds present in the active fractions are analyzed and compared to in-house databases and commercial software such as the Dictionary of Natural Products (Taylor & Francis Group), MarinLit (Royal Society of Chemistry), AntiBase (Wiley-VCH), and SciFinder (American Chemical Society). Work is discontinued on samples where the activity is accounted by the identified compound or compounds, while like extracts are grouped for further evaluation.

First, letʼs consider a manageable hit rate of around 0.2 to 0.5 %, which would involve further evaluation of 200 to 500 samples from screening of a 100 000 member library. The active samples are first cherry-picked from the library and then retested in the screening assay to confirm activity. The next stage is to obtain a 4-log fold range dose response on the retested positive samples and then test in appropriate orthogonal and secondary assays. An orthogonal assay is the same target as the HTS assay but in a different screening format that is used to help identify false positive results. Orthogonal assays are especially important when screening crude extracts due to the potential for a variety of interference compounds. Frequently run secondary assays include whole cell cytotoxicity, antifungal, gram-positive and gram-negative bacteria screening, and protease, kinase, and GPCR panel profiling to analyze for specificity. The retesting and secondary screening phases usually take at least 2 weeks depending upon the amount of secondary screening required. The number of samples for further evaluation should now have been reduced to below 50 and after dereplication will result in a reasonable number for bioassay-guided isolation.

However, what do you do with a hit rate of around 5 to 10 %, which would involve further evaluation of 5000 to 10 000 samples from screening a 100 000 member library? An example of a screen with a potentially high hit rate is the screening of bacterial-derived extracts and prefractions against the gram-positive bacteria Staphylococcus aureus. The hit rate could be lowered by raising the cutoff threshold, but this is risky as extracts or prefractions that contain lower abundance actives will not be selected and potential leads not identified. The only practical way forward with this number of extracts is to undertake considerable secondary screening or cross-referencing with existing screening data, as well as undertaking considerable dereplication, either through sheer force of numbers or by rapid analysis for common hitting compounds by LC-MS/MS without sample collection. Extracts can be further clustered by their biological profiles, dereplication profiles, and taxonomy.

The next step is to undertake bioassay-guided isolation [24], [117], [118] to identify the bioactive compounds present in the extract. Guided by the dereplication profile, a chromatography step is undertaken, and the fractions are subjected to testing alongside the crude extract or originating fraction. This process is repeated until the most active compound or compounds are identified that account for the activity of the extract or prefraction. For biological evaluation, the compound needs to have > 95 % purity, especially for structure-activity studies. Always be on the lookout for samples where there could be a minor component present that is significantly adding to, or responsible for, the activity. For example, for µM active compounds, the presence < 1 % of an nM active compound could account for the activity and not be able to be identified by NMR or even LC-MS. If this is a possibility, then isolate related compounds to see if they are active or collect sub-fractions of the peak and look for fractions that are equally active.



Conclusion

The unique chemical structures and biological activities of natural products make them attractive candidates for drug lead discovery and as chemical probes. Most large pharmaceutical companies have abandoned natural product-based lead discovery, which has created great opportunities for university-based research groups and small biotechnology companies. Expertise is often available in these organizations to validate the drug leads and undertake preclinical drug development using in-house resources and CROs. However, if natural product-based libraries are going to be efficiently used to screen for new drug leads, they need to be of the highest quality. To start assembling this type of library, we first recommend that analysis methods (including dereplication) to assess metabolite diversity are implemented, all organisms used to generate the library are taxonomically identified, and a pure natural product library is assembled to aid in pilot screening studies and dereplication. The library should be a balanced mixture of crude extracts, prefractions, and pure compounds to optimize taxonomic and chemical space coverage, while minimizing the number of screening points. The library should be subdivided into smaller screening sets based on taxonomy, microorganism cultivation conditions, and sample preparation methods. The library should also be dynamic with extracts and prefractions added and removed from the library in response to chemical analyses and screening results. The next step in the process is to develop (or work out with collaborators) robust assays that are suitable for screening natural product samples. Orthogonal assays also need to be developed to help identify false positive results, as well as secondary assays to help select the best hits for further evaluation. Before full scale campaign screening is undertaken, a pilot study needs to be completed and the data analyzed. We also recommend that the library is screened at two concentrations when practicable, especially for cell-based assays. Active samples are dereplicated, and selected samples then undergo bioassay-guided isolation to identify the active compound(s) responsible for the activity. With luck, these biologically active compounds have the potential to be new drug leads and chemical probes.


Acknowledgements

This paper was prepared with the support of NHMRC grant AF511105. MSB is supported by a Wellcome Trust Seeding Drug Discovery Award (094 977/Z/10/Z).



Conflict of Interest

The authors declare no competing financial interest.


Correspondence

Dr. Mark S. Butler
Division of Chemistry and Structural Biology, Institute for Molecular Bioscience, The University of Queensland
St. Lucia, Brisbane, 4072
Australia
Phone: +61 7 33 46 29 92   
Fax: ++61 7 33 46 20 90   


Zoom
Fig. 1 Structures of the complex, sponge-derived anticancer lead halichondrin B and its synthetically inspired drug eribulin produced by total synthesis; ester hydrolysis of the naturally occurring ester mixture to give cephalotaxine, which is then esterified to give the anticancer drug homoharringtonine (omacetaxine mepesuccinate); the ascidian-derived anticancer drug trabectedin is semisynthetically produced from the bacterial-derived cyanosafarin B.
Zoom
Fig. 2 Screening at high concentrations (Zone 1) leads to higher hit rates (continuous line) that identify a predominance of compounds with low target affinity (dashed line). For example, screening a set of marine invertebrate extracts at 10 µg/mL w/v with compounds at a relative abundance of 1–10 % (average MW of 500) is equivalent to testing pure compounds in the 0.2 to 2 µM range, which often results in the identification of hits in the mid- to low µM affinity range. Although mid- to low nM actives can also be identified when screening in Zone 1, the extract matrix effect and compound interference can also mask the activity. Screening in Zone 1 for cell-based assays where cytotoxicity often leads to “bell-shaped” dose-response curves will miss relatively abundant mid- to low nM actives. Screening the extracts at 10- and 100-fold lower concentrations (Zone 2 and Zone 3) reduces the matrix and interference effects leading to identification of extracts that would have be missed or not prioritized when screening in Zone 1. As a consequence, we recommend screening natural product extracts and prefractions when practicable using at least two concentrations, especially for cell-based screening.
Zoom
Fig. 3 Schematic of a typical dereplication process. Up to 1 mg of crude extract or prefraction are separated using HPLC (C18 column) and the eluting compounds analyzed using a photodiode array detector (PDA). The eluent is then split with a majority flowing into a microtiter plate and the remainder analyzed using ESI-MS and MS/MS. The microtiter plate contents are dried using centrifugal evaporation and DMSO is added to each well ready for screening. It is prudent to add the crude extract or originating fraction to the plate at two concentrations for comparative purposes. The (A) retention time (RT), (B) UV spectra, (C) MS, and (D) MS/MS of compounds present in the active fractions are analyzed and compared to in-house databases and commercial software such as the Dictionary of Natural Products (Taylor & Francis Group), MarinLit (Royal Society of Chemistry), AntiBase (Wiley-VCH), and SciFinder (American Chemical Society). Work is discontinued on samples where the activity is accounted by the identified compound or compounds, while like extracts are grouped for further evaluation.