Keywords
International Medical Informatics Association Yearbook - bioinformatics and translational
informatics - artificial intelligence
Introduction
Within the 2019 International Medical Informatics Association (IMIA) Yearbook, the
goal of the Bioinformatics and Translational Informatics (BTI) section is to provide
an overview of research trends from 2018 publications that demonstrated excellent
research about various aspects of bioinformatics methods and techniques to advance
clinical care. In 2008, the American Medical Informatics Association (AMIA) has defined
translational bioinformatics as the development of storage, analytic, and interpretive methods to optimize the transformation
of increasingly voluminous biomedical data into proactive, predictive, preventative,
and participatory health” [1]. First priorities addressed storage and retrieval, and focused analytics of high-throughput
data motivating numerous research and development studies in the last decade. Today,
the topic is clearly coming of age with more ambitious objectives (such as pan-cancer
approaches, multi-omics analyses, drug re- purposing) which make use, among others,
of the most advanced computational methods such as Artificial Intelligence and Deep
Learning- this year’s special theme for the IMIA Yearbook.
Paper Selection Method
Following the method described in [2], a comprehensive review of articles published in 2018 and addressing various subtopics
for BTI was conducted. The selection was performed by querying MEDLINE via PubMed
(from NCBI, National Center for Biotechnology Information) with a set of predefined
Medical Subject Headings (MeSH) descriptors along with free terms: Translational informatics;
Translational bioinformatics; Bioinformatics; Computational molecular biology; Computing
Methodologies; Information storage and retrieval; Pattern recognition, Automated;
Medical informatics, Algorithms; Translational medical Research; Genetics, Medical;
Precision Medicine; Personalized medicine; Molecular Medicine; Genomic medicine; Medical
genetics; Medical genomics; Clinical genomics; Genetics; Genomics; Next-generation
sequencing; High throughput sequencing; Transcriptome; Transcriptomics; Proteome;
Proteomics; Proteogenomics; Epigenomics; Metabolomics; Metagenomics; Large-scale datasets;
Big data; Omics; and Multi-omics. Bibliographic databases were searched on February
24th, 2019 for papers published in 2018, considering the electronic publication date.
The original set of 636 references was reviewed jointly by the two section editors
to select a consensual list. Hence, 42 (respectively 45) references were selected
by the first (second) section editor based on the title and abstract of papers. Among
the 16 papers in common, three were excluded due to moderate interest and critical
length. Following the IMIAYearbook process, the 13 candidate best papers were peer-reviewed
by the IMIA Yearbook editors and external reviewers (at least three reviewers per
paper). Four papers were finally selected as best papers ([Table 1]). A content summary of these best papers can be found in the appendix of this synopsis.
Table 1
Best paper selection of articles for the IMIA Yearbook of Medical Informatics 2019
in the section ‘Bioinformatics and Translational Informatics’. The articles are listed
in alphabetical order of the first author’s surname.
Section
Bioinformatics and Translational Informatics
|
▪ Lee SI, Celik S, Logsdon BA, Lundberg SM, Martins TJ, Oehler VG, Estey EH, Miller
CP, Chien S, Dai J, Saxena A, Blau CA, Becker PS. A machine learning approach to integrate
big data for precision medicine in acute myeloid leukemia. Nat Commun 2018 Jan;9(1):42.
|
▪ Mobadersany P, Yousefi S, Amgad M, Gutman DA, Barnholtz-Sloan JS, Velazquez Vega
JE, Brat DJ, Cooper LAD. Predicting cancer outcomes from histology and genomics using
convolutional networks. Proc Natl Acad Sci U S A 2018;115(13):E2970-E2979.
|
▪ Sengupta S, Sun SQ, Huang KL, Oh C, Bailey MH, Varghese R, Wyczalkowski MA, Ning
J, Tripathi P, Mc Michael JF, Johnson KJ, Kandoth C, Welch J, Ma C, Wendl MC, Payne
SH, Fenyo D, Townsend RR, Dipersio JF, Chen F, Ding L. Integrative omics analyses
broaden treatment targets in human cancer. Genome Med 2018 Jul 27;10(1):60.
|
▪ Torshizi AD, Petzold LR. Graph-based semi-supervised learning with genomic data
integration using condition-responsive genes applied to phenotype classification.
J Am Med Inform Assoc 2018;25(1):99-108.
|
Description of Candidate Best Papers and Best Papers
Rapid content analysis of the 636 retrieved references revealed a large proportion
of papers dealing with identification and routine use in clinical settings of genetic
variants in connection with various diseases. Through their choices, section editors
wanted to shed lights on three research trends and two emerging topics in BTI field
which are presented in the following [3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15].
Trend 1: Artificial Intelligence and Deep Learning at the service of translational
informatics
While machine learning has been used in medical informatics for a few decades, the
year 2018 has seen a renewed interest for the field. New methods, including Neural
Networks, have been largely adopted by the medical community in virtually all the
realms of medicine. In the recent year, the most spectacular results have been obtained
for images analysis (and especially for photographic image analysis, e.g., digital
pathology images, retina pictures...), but numerous other fields are investigated.
Three of the 13 candidate best papers are directly using artificial intelligence methods.
The contribution by Mobadersany et al., presents a method to predict the survival
of patients based on digital pathology images as well as genomics biomarkers [8]. The work by Lee K. et al., [5] used deep neural network and a word embedding built with word2vec in order to identify
pharmacog- enomics relations in the literature. Finally Lee SI. et al., described
the MERGE algorithm (see Trend 2 for further details) to identify treatments against
acute myeloid leukemias [6]. In their paper, Mobadersany et al., developed a neural network based approach to
predict the survival of patient on the basis of digitalized pathology images and genomics
biomarkers. The authors describe a Survival Convolutional Neural Network (SCNN) designed
to predict the survival of patients suffering from glioma. Networks were trained using
public data coming from the Cancer Genome Atlas (TCGA) datasets. To help the interpretation
and the understanding of the prediction, the authors use a heat map visualization
illustrating the structures identified as important by the neural networks.
Trend 2: Pan-cancer approach and integration of multi-omics data for more insightful
analyses
The large availability of high throughput data in the context of biological and clinical
research but also during normal patient care has enabled the development of new approaches
to classify diseases and to identify potentially better or new treatments for known
diseases. The best paper candidates highlight this trend in four contributions using
multi-omics datasets (leveraging gene expression, methylation, proteomics, next generation
sequencing (NGS),...etc.). These new approaches are generating a great amount of interest
and a large body of work as represented by the selection of three of the four best
papers of the BTI section.
In their work Sengupta et al., adopted a pan-cancer approach to take benefit of multi-omics data for drug repurposing
[11]. Their goal is to identify drugs approved by the Food and Drug Administration (FDA)
for cancer location not yet mentioned in the approval. The authors rely on the Database
of Evidence for Precision Oncology (DEPO), a tool they built and presented in [18], to link druggability to genomic, transcriptomic, and proteomic biomarkers. They
used a pan-cancer cohort of more than 6,500 tumors to identify those with potential
druggable markers. The authors rely on the DEPO database (integrating genomic, transcriptomic,
and proteomic data, as well as clinical data over several types of cancer samples)
and structural alignment tools for identifying tumors with potentially druggable biomarkers
consisting of drug-associated mutations, micro-RNA expression outliers, and protein/
phosphoprotein expression outliers.
As opposed to the two pan-cancer contributions, the next two candidates focus on specific
tumor locations : in [3] (a candidate best paper), the authors rely on a multi-omics approach to define stage
and predict treatment outcome for patients suffering from Crohn’s disease. Selected
as best paper, Lee SI et al.,
[6] present a statistical method to identify molecular markers for targeted treatment
of acute myeloid leukemia using omics data (genome-wide gene expression profiles)
and in vitro sensitivity to 160 chemotherapy drugs. They describe the MERGE algorithm
(standing for mutation, expression hubs, known regulators, genomic copy number variation,
and methylation) a computational method to identify gene expression markers using
multi-omics data. In a nutshell, MERGE learns from data the contribution of five key
features (mutation associated to acute myeloid leukemia, hubness in a gene expression
network.) to the drive of gene potentially implicated in cancer progression. The work
by Torshizi and Petzold [13], detailed in the next trend section, also makes use of multi-omics.
Trend 3: Pathway-level versus gene-level analyses
A trend which is becoming established concerns the early integration of background
biomolecular knowledge in the BTI context as initiated in previous publications [17]. As an alternative to single-gene analyses of patient genomic data, interesting
papers published in 2018 adopt the pathway level to identify relevant biomarkers or
to improve phenotype classification. These studies go from the assumption stating
that cohort patients exhibit homogeneity at the transcript level (deregulated expression
of a set of genes) to the same assumption at the pathway level (deregulated pathway
expression). This is especially relevant in complex diseases or heterogeneous patient
populations where conventional data analytics provided rather poor results in terms
of precision medicine, so far [16].
The first paper selected as the best paper authored by Torshizi and Petzold presents
a graph-based semi-supervised method to phenotype classification [13]. Several graphs of labeled and unlabeled samples are built on features sets corresponding
to distinct genomic levels namely, gene expression, DNA methylation, and micro-RNA.
Additional graphs add pathway knowledge - for each considered genomic level - through
the use of condition-responsive genes (CORGs) defined in [17]. The method was applied on ovarian cancer data (from the Human genome Atlas) and
the comparative evaluation results show that the classification accuracy in terms
of survival is effectively improved when integrating transcriptomic, epigenetic, and
pathway knowledge. A noteworthy advantage of the proposed approach is its capacity
to address the positive example sparsity problem.
Zaim et al., propose in their paper (a candidate best paper) a statistical framework to revisit
the biomarker discovery process [15]. They show through a POC simulation how to discover common pathways in a single-subject
approach and the gained advantages over usual cohort-based approaches in specific
scenarios.
Emerging Topic 1: Towards clinician-friendly infrastructures for data integration
and analyses
The growing complexity of biological and clinical datasets requires new tools to help
researchers in data management and exploration process. This year saw a number of
contributions providing experts with innovative ways to interact with multi-omics
and clinical datasets. Among the 13 candidate best papers, two were addressing the
issue of data exploration [9],[4], and one described a large national infrastructure hosting data and samples at a
national level [7]. Moscatelli et al., present an infrastructure design to simplify the exploration of clinical data, both
structured (e.g., laboratory data) and unstructured (free text clinical reports) [9]. Their infrastructure relies on a NoSQL structure, and manages anonymization and
machine learning layers to assist the research in the mining of data. Krempel et al., introduce the CancerSysDB, a web-based application designed to host multi-omics
pan-cancer data, and to simplify the querying process [4]. CancerSysDB is open-source and can host both public datasets, and especially data
coming from the TCGA, but also private datasets through the use of a self-hosted instance
of the system. Workflows can be connected to the application and dynamically uploaded
to the application.
Emerging Topic 2: Ethical and methodological issues raised in BTI practices
Even though they are not new, ethical questions are still raising by the merger of
research studies and clinical care practices. A certain exacerbation can be noticed
may be due to the development of private companies offering genetic services. Among
the 13 candidate best papers, we have a collection of three papers addressing ethical
questions in several countries settings [10], [12], [14]. An additional paper covers lawful procedures of access to biobanks and electronic
health records in Taiwan as the authors present the Taiwan Biobank involving biopsy
samples from 200,000 participants (patients and citizens) and discuss possible solutions
for ensuring both broad access and privacy preservation [7]. The first paper of the collection provides an interesting analysis of the results
of a survey over nine sites implementing translational genomics (Clinical Sequencing
Evidence-Generating Research (CSER) consortium) [14]. The results -although limited to the CSER consortium participants- are insightful
regarding several issues raised at the interface between sequencing-based research
and clinical care such as informed consent procedures, clinician and researcher roles,
disclosure of primary results and secondary findings, storage of results in the medical
records, payment for services, and overall characterization of the research-clinical
interface. The second paper informs on the current debate around disclosure of genomic
secondary findings [10]. The reported opinions and positions are those of UK participants to genome sequencing
program namely the rare disease genomic medicine multidisciplinary team involved in
the 100,000 Genomes project. The paper answers some important questions facing multidisciplinary
care boards. Concerning the disclosure procedure of secondary findings, the conclusion
of UK experts is in agreement with the US CSER survey results [14]. As for the third paper, Stoekle et al., discuss interesting ethical issues from the perspective of the evolution of tumor boards to molecular tumor boards in French medical systems [12]. This evolution is a consequence of two important changes: NGS techniques which
henceforth generate whole-genome sequencing data at low costs, and machine learning
approaches which open huge perspectives for analysing patient data. The authors discuss
how to improve patient confidence and trust in academic medical centers to prevent
commercial private companies to exploit exclusively for-profit sensitive genetic data.
The suggested means include the use of information technology (IT) for more efficient
acquisition of informed consent from patients and better communication modalities
with researchers and clinicians.
Conclusion and Outlook
A few interesting papers published in 2018 in the BTI scope matched the IMIAYearbook
special topic “Artificial Intelligence”. These papers illustrate well the complexity
and the constraints induced by the deployment of deep-learning techniques, especially
in the context of multidisciplinary and personalized care (including molecular characterization
of tumors...). Further intelligent approaches are expected in coming years, combining
semantic web languages with clinical omics data and biomolecular knowledge for extracting
self-explanatory and actionable knowledge nuggets in clinical settings. It is worth
noting that many contributions keep on relying on public datasets (such TCGA...),
as well as open tools and systems. In this context, the emergence of clinician-friendly
exploration and analysis platforms should bring closer the clinical, translational,
and bioinformatics communities.