Yearb Med Inform 2017; 26(01): 110-119
DOI: 10.15265/IY-2017-041
Section 4: Sensor, Signal and Imaging Informatics
Survey
Georg Thieme Verlag KG Stuttgart

An Assessment of Imaging Informatics for Precision Medicine in Cancer

C. Chennubhotla
1  Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA
,
L. P. Clarke
2  Cancer Imaging Program, NCI, NIH, Bethesda, MD, USA
,
A. Fedorov
3  Department of Radiology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
,
D. Foran
4  Department of Pathology and Laboratory Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers, New Brunswick, NJ, USA
,
G. Harris
5  Harvard Medical School, Boston, MA, USA
,
E. Helton
6  Center for Biomedical Informatics and Information Technology, NCI, NIH, Bethesda, MD, USA
,
R. Nordstrom
2  Cancer Imaging Program, NCI, NIH, Bethesda, MD, USA
,
F. Prior
7  Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
,
D. Rubin
8  Department of Radiology, Stanford University, Palo Alto, CA, USA
,
J. H. Saltz
9  Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
,
E. Shalley
2  Cancer Imaging Program, NCI, NIH, Bethesda, MD, USA
,
A. Sharma
10  Department of Biomedical Informatics, Emory University, Atlanta, GA, USA
› Author Affiliations
Further Information

Correspondence to:

Ashish Sharma
Woodruff Memorial Research Building
101 Woodruff Circle, #4105
Atlanta, GA 30322, USA
Phone: +1 (404) 654 0124   

Publication History

Publication Date:
11 September 2017 (online)

 

Summary

Objectives: Precision medicine requires the measurement, quantification, and cataloging of medical characteristics to identify the most effective medical intervention. However, the amount of available data exceeds our current capacity to extract meaningful information. We examine the informatics needs to achieve precision medicine from the perspective of quantitative imaging and oncology.

Methods: The National Cancer Institute (NCI) organized several workshops on the topic of medical imaging and precision medicine. The observations and recommendations are summarized herein.

Results: Recommendations include: use of standards in data collection and clinical correlates to promote interoperability; data sharing and validation of imaging tools; clinician’s feedback in all phases of research and development; use of open-source architecture to encourage reproducibility and reusability; use of challenges which simulate real-world situations to incentivize innovation; partnership with industry to facilitate commercialization; and education in academic communities regarding the challenges involved with translation of technology from the research domain to clinical utility and the benefits of doing so.

Conclusions: This article provides a survey of the role and priorities for imaging informatics to help advance quantitative imaging in the era of precision medicine. While these recommendations were drawn from oncology, they are relevant and applicable to other clinical domains where imaging aids precision medicine.


#

Introduction

Precision medicine [[1]] requires the measurement, quantification, and cataloging of medical characteristics to identify the most effective medical intervention. The National Academy of Sciences defines precision medicine as “the tailoring of medical treatment to the individual characteristics of each patient. It does not literally mean the creation of drugs or medical devices that are unique to a patient, but rather the ability to classify individuals into subpopulations that differ in their susceptibility to a particular disease, in the biology and/or prognosis of those diseases they may develop, or in their response to a specific treatment” [[2]]. In other words, through precision medicine we can classify patients into cohorts that share characteristics such as diagnosis, prognosis, response to a certain therapy etc. This requires ready access to networks of data that can be queried using many different types of search criteria across many different types of data. And to create such classifiers, large quantities of diverse data must be accessed, analyzed, and reduced to actionable knowledge for patients and encounters.

Imaging, which includes radiology, radiation oncology, and pathology, complements clinical and molecular data and offers crucial insights that help stratify patients into cohorts and guide care using the principles of precision medicine [[3]–[7]]. In addition to diagnosis and treatment planning, imaging also has the potential to provide deep and novel insights by evaluating a patient’s response to therapy during treatment, as well as predicting outcome at an earlier time point [[6]–[8]]. Treatment response and early outcome prediction thus create opportunities for adaptive medicine. For example, in breast cancer patients with ER+, PR+, and HER2– invasive ductal carcinoma MRI-based features (texture and morphological) could predict the likelihood of recurrence and magnitude of chemotherapy benefit [[9]]. Clustering morphological signatures extracted from digitized whole-slide pathology images of glioblastoma patients helped identify significant prognostic sub-classifications, in which clusters are correlated with transcriptional, genetic, and epigenetic events [[10]]. Imaging can also be used for surveillance in certain low-grade cancers and help avoid unnecessary biopsies. In such patients, imaging features can also be used to identify sub-populations that are likely to advance to higher-grades, and would thus be candidates for specific treatments [[11], [12]]. These four examples, from breast cancer, glioblastoma, low-grade glioma, and prostate cancer, are illustrative of the role imaging can play in precision medicine for cancer. A more detailed survey of imaging and its role in precision medicine was recently reviewed by the Association of University Radiologists Radiology Research Alliance [[13]]. This survey, in many ways, complements the review by providing a survey of challenges, and an assessment of the needs of imaging informatics, for the advancement of precision medicine in cancer.

Today, imaging is predominantly digital; however, image interpretation and its use in diagnosis and treatment assessment have remained largely qualitative. This has been changing steadily through initiatives such as the Quantitative Imaging Biomarker Alliance (QIBA) [[14]–[16]], as well as research programs such as the Quantitative Imaging Network (QIN) [[16], [17]]. This article surveys the landscape of quantitative imaging, its role in advancing precision medicine, some of the informatics priorities, and challenges, and presents some recommendations. It should be noted, that while this article focuses on cancer imaging, the underlying needs and challenges are by no means unique to cancer. Domains such as neurology or cardiology have similar characteristics and requirements. For example, in neuroimaging, imaging data is closely linked with observational data or connectome data. These data types are not typically seen in oncology. However, both oncology and non-oncological conditions share the pattern where the integration of imaging and associated/ derived non-imaging data with clinical and genomic data can be used to classify patient populations based on diagnosis and response to treatment [[6], [7]]. For the sake of readability, this article does do not make a distinction between imaging and cancer imaging. The challenges, approaches, and recommendations surveyed here are applicable across the broad landscape of quantitative imaging and its application to precision medicine.

It is also worth emphasizing that while much of this work was predominantly geared in recent years towards radiology, there has been a steady increase in the research and development of similar techniques in pathology. Cancer diagnosis is primarily based on pathology; outcome prediction and treatment recommendations are highly dependent on pathologist observations. While digital pathology imaging has lagged behind radiology imaging due to the continuing use of glass slides in clinical diagnostic pathology, the advent of high quality, high throughput, digital scanners has led to the widespread adoption of digital pathology in cancer research studies. DICOM (Digital Imaging and Communications in Medicine), the de facto standard for medical imaging now includes specifications for digital pathology [[18], [19]]. Digital pathology data management, visualization, and analysis tools have been developed by both research groups and private companies. Work is rapidly progressing on the development of standards for pathology data management, annotation, and markup. Such advances as well as this increased adoption has led to digital pathology sharing with radiology many of the same imaging-based, informatics design patterns for disease classification, patient stratification, response assessment, and outcome prediction.

The inclusion of information obtained from digital pathology is crucial to the success of efforts to improve precision of quantitative imaging-based predictions. Diagnostic and treatment guidelines call for quantitative measurements that are challenging for human observers (e.g., tumor infiltrating lymphocytes, mitoses and immunohistochemistry (IHC) staining). There is also an increasing tendency to mandate detailed assessments of tumor heterogeneity across tumor type. A highly pertinent example is non-small cell lung cancer (NSCLC) adenocarcinoma WHO guidelines that specify that for each patient, pathologists break down sub-type composition in 5% increments. Digital pathology machine-learning methods also promise to reduce inter-observer variability arising from the sole reliance on human-generated pathology classifications.


#

Quantitative Imaging, and Informatics Methodologies for Precision Medicine

Quantitative imaging is the process of extracting measurable (numerical) information from images to determine the amount, extent, or severity of disease, where imaging devices behave as standard measurement instruments providing reliable and reproducible numerical results. It has benefitted from advances in image acquisition that have led to improvements in quality and resolution of imaging and diversity imaging modalities. Quantitative imaging, through advances in high performance computing and machine learning, has enabled the process of radiomics (extraction and mining of quantitative imaging features) [[18]–[20]] and radiogenomics (integrating radiomic features with clinical and molecular data). It is enabling optimized treatments, surveillance, and better prediction of response to treatment, and it offers great promise for precision medicine. Numerous groups have developed methodologies to extract rich collections of imaging features, linked them with clinical outcome and molecular characterizations, and studied their relevance in clinical research [[10], [18]–[87]].

Informatics is the practice of information processing and the engineering of information systems, focusing on the collection, classification, storage, retrieval, and dissemination of recorded knowledge. Quantitative imaging, therefore, benefits from the methodologies, tools, and capabilities that are offered by informatics to help convert the information contained in images into actionable knowledge. The combined information that is gathered can be used to enhance individual and population health outcomes, simplify patient care, and improve the quality of clinical workflow. One notable effort in this area is the National Cancer Institute (NCI) Quantitative Imaging Network (QIN). QIN has encouraged the development and validation of quantitative imaging methods for the measurement of tumor response to therapies in clinical trials and routine care.

It is imperative that by incorporating the science of informatics into medical imaging we create a powerful driver for precision medicine activities. Open and regular communication among scientists, clinicians, informatics specialists, and regulatory experts is needed. We discuss some of the highest priority informatics activities that will help bring quantitative imaging into the realm of clinical decision support. These priority activities were identified through a series of workshops, the first of which was convened in October 2015, to discuss the joint roles of quantitative imaging and informatics within the context of present and future precision medicine needs. The primary needs and challenges, as well as some representative efforts in areas, are presented here.


#

N1 Curated Data Repositories

Access to well-curated image repositories with support for semantically integrated datasets and the ability to integrate information across type and scale are critical

Archiving vast amounts of existing data is a major informatics effort, not only because of the rapid growth in the volume of data to be stored, but also because of the challenges in accessing, retrieving, analyzing, and displaying the results. Imaging datasets require hundreds of terabytes to petabytes of storage. Imaging features, produced by a pipeline, can depend on a variety of parameters, leading to an explosion of post-processing feature data. Meaningful comparison and subsequent downstream use of the imaging features necessitate a standardized representation. Additionally, clinical information about each patient should be linked to the image data. The linked clinical data must be available when searching for sub-populations. It is therefore imperative that the research community stops reinventing the wheel in the context of imaging biomarker development and comes up with common ways to share tools and data to help improve interoperability. There are three types of data that researchers and practitioners of quantitative imaging share, namely:

  • Clinical Data: Clinical data includes demographic information, diagnosis, exposure, family history, treatment, and outcomes data. Clinical data must be harmonized against a common vocabulary. This is an active area of research. One possible direction is the use of DICOM to represent clinical data [[88]]. This is an attractive proposition given the near-universal acceptance of DICOM, especially in the clinical domain. However, the DICOM specification merely provides a data representation format. Work is needed to create an ontology that can be used to encode the data. An example of such an ontology that helps encode clinical data in DICOM has been developed for Head and Neck cancers [[88]]. Another option that is being explored is the use of the clinical data model used in The Cancer Genome Atlas (TCGA) [[89]]. The TCGA clinical data model includes site-specific terms, with mappings to the NCI Thesaurus, and would provide the ability to create image cohorts that span different imaging studies. The TCGA clinical data model has also been adopted by The Genomic Data Commons (GDC) [[90], [91]].

  • Images and Image Metadata: This includes the raw pixels as well as metadata that describe the image such as patient level information, acquisition data, etc. Image metadata is frequently stored in DICOM formats and follows the DICOM information hierarchy. Other formats such as NIfTI (Neuroimaging Informatics Technology Initiative) are also widely used [[92]]. DICOM is not widely used in digital pathology, since most digital pathology scanner vendors prefer their own formats. There are however open source libraries, such as OpenSlide [[93]], that allow researchers to interact with these images using a shared library and application programming interfaces (APIs).

  • Image Annotations and Features: These include human and machine generated annotations and features. QIN agreed to adopt DICOM as a standard for images and segmentation maps. The term “features” refers to the quantitative characteristics extracted from images; these are represented in various open formats. Since imaging features are at the core of quantitative imaging, a detailed description about their representation and storage is presented separately (see N3 & N4).

Data Curation: The data used in the development of imaging-based methodologies for precision medicine must be well curated to reduce any uncertainty in its history or content. The use of standards such as DICOM is therefore essential. In addition to standards, the data management system, as well as the processes of data curation are very important. In recent years, several imaging repositories have come online line [[92], [94]–[99]], with The Cancer Imaging Archive (TCIA) being an exemplar of a well curated, diverse, imaging repository. Since its inception in 2011, TCIA has evolved into NCI’s primary resource for curating, managing, and distributing images. A significant component of TCIA operations and tools involves the curation and de-identification processes. With the adoption of data standards, as well as the deployment of easy to use tools and shared best practices, the process of data preparation and submission is greatly simplified. This results in reducing the burden of data sharing, and in faster submission and quicker dissemination of data. This has the added advantage of encouraging data sharing, since the burden of data sharing is frequently cited as one of the common roadblocks to data sharing.

De-identification: A key component of data curation is having well-documented processes and tools that facilitate the de-identification of data and the removal of any patient identifiers. It is often assumed that de-identification involves the scrubbing of protected health information (PHI) from DICOM headers. However, in practice, it has been observed that scanners frequently encode identifiable information in private DICOM tags. There are additional challenges with de-identification when dealing with time-series data. In such situations, the chosen heuristic for date de-identification must be cognizant of time elapsed between successive studies. This information allows users to run queries such as: “find all lung screening studies where 3 or more studies were performed, and each study was within 6 months of the prior study.” In addition to maintaining the elapsed time, users may want the ability to integrate these imaging studies with other non-imaging data. Therefore, enough metadata must be preserved to ensure compliance with appropriate rules and regulations, while ensuring that researchers can unambiguously locate imaging and associated non-imaging data [[100]].


#
#

N2 Data Exploration, Access, and Integration

Exploring and accessing images along with associated data is critical to research in the era of Precision Medicine

Data Exploration and Access: How data is managed is critical when it comes to versatility and ease of use of the data. While methods of storage are important for creating useful and minable image troves, efficient content-based methods of data retrieval may be even more essential to making these data accessible and usable. Search engines capable of returning image data along with all appropriate metadata are necessary assets in the era of large electronic datasets. It is necessary for data associated with the images (see N1) to be accessible and integrated with image data. It is not practical, nor are we advocating for a centralized meta repository that manages all data types. Rather a mechanism that allows one to use some of the data types to create a cohort and then access images and relevant data directly from the individual repositories that manage that data.

Data Integration: An example of this would be an integration of TCIA with the Genomic Data Commons (GDC). It would give researchers the ability to create a cohort using genomic, clinical, and imaging attributes, and then access the images and genomic data for the identified cases. One popular and decentralized approach to achieving such integration is through the adoption of REST APIs. Our goal should be adoption of APIs and when appropriate, convergence on shared API specifications. The underlying API implementations are best left to the repositories, and will have minimal impact on facilitating an integrated exploration and retrieval of data.

The development and adoption of an API economy has the added advantage of encouraging developers to directly and programmatically integrate with the various data repositories. Doing so allows, for example, an image analysis algorithm to directly retrieve images from TCIA without requiring that a user first download a dataset, then upload it to a local cluster, and finally launch the algorithm on this dataset. Integration via APIs allows one to run large, cloud-based, pipelines that can exploit the cost and scale benefits of clouds (See N6). Similarly, research workstations like 3DSlicer [[101]] can directly integrate with, and utilize the search and retrieval capabilities of image repositories, giving their users an optimized experience.


#
#

N3 Algorithm Validation and Reproducibility

In addition to the reduction of hardware errors in data collection, quantitative imaging deals with the development and optimization of robust algorithms capable of extracting useful information from the collected images. The potential of quantitative imaging can only be realized if the algorithms are reproducible and validated. These algorithms are individually designed to serve specific functions in a chain of analysis, that begins with the collected images, and ends with the extracted quantitative information. Functions such as segmenting suspicious regions in the image and then processing the information within those regions for information correlative with disease are included. Quantitative methods differ in how much information is extracted and used, and in how the information is assembled for dissemination. In addition, as there are many different imaging systems in use, the information extracted must be available in a standardized form that can be read and interpreted across multiple devices. Using this information, researchers can then generate new diagnostic and prognostic techniques. One notable mechanism for advancing these goals are Grand Challenges that have proven to be a successful means for promoting innovation in algorithm development [[102]].

Feature Generation: Imaging features cover the gamut of tumor segmentations, observations, and features captured by humans, as well as features that are computed by algorithms. They include qualitative, quantitative, and mixed features. The underlying objective is to capture a set of features that can act as numeric surrogates for an image and can then be used to explore correlations with clinical or genomic data. They could be used to train classifiers that can guide diagnosis, prognosis, or response to therapy. For example, Aerts et. al., extract ∼400 features from CT and MRI images of lung cancer, and head and neck cancer patients, and identify feature signatures that are strong predictors of outcome [[18]]. Features here included morphological features, tumor intensity, texture, and other higher-level features. In digital pathology, a similar process is followed, leading to the coinage of the term pathomics. An illustrative example of pathomics is the work done by Cooper et. al., where they processed glioblastoma images and extracted 74 different features [[103]], that included morphological characterizations of nuclei, nuclear intensity, texture, and gradient statistics. These features were extracted from 200M nuclei and revealed three prognostically significant clusters with associations to genetic mutations and outcomes [[10]]. A similar study was done by Huang et. al. on breast cancer images [[104]].

Deep Learning and Medical Imaging: In recent years, there has been a strong interest in the application of neural networks and deep learning for quantitative imaging. These methodologies have been around for a long time, and as far back as the early 90s, during the early days of digital imaging, they were used in a variety of applications, such as the detection of lung nodules [[105], [106]], and classification of regions of interest (ROIs) from mammograms as benign or malignant [[107]]. However, it was not until very recently that deep learning gained popularity and has emerged as one of the most promising tools for image classification. This popularity is driven, not only by advances in algorithms, but also, in large part, by advances in high performance computing (HPC), incl. graphics processing units (GPUs), and the fact that the cost of these HPC systems have come down significantly. Since these algorithms are easily parallelizable, they can take advantage of the inherent parallelization of GPUs. A comprehensive survey, as well as a series of articles, covering the use of deep learning and medical imaging can be found in an IEEE special issue on medical imaging and deep learning edited by Greenspan et. al. [[108]]. There are however a set of challenges that have slowed down the success of deep learning, the biggest one being access to large quantities of annotated data sets (see N1). Data needs to be well annotated (N4), and researchers should have the ability to integrate the extracted ‘hidden’ features with clinical and/or genomic data (N2).

Algorithm Validation: The performance characteristics of the algorithms must eventually be tested and validated under a variety of clinically relevant conditions before they can be useful in clinical workflow. This often requires the use of large datasets of clinical images as an environment in which to test the performance and robustness of the algorithms. Ideally, metadata such as annotations, clinical information from patient history, and patient outcomes will be a part of the information included in algorithm validation. The need for informatics in this process is critical and is integral to the process not only through the function of the final quantitative algorithm as it performs in clinical workflow, but also during the degree of required testing and validation needed to ensure algorithm performance before it reaches the clinical setting.

Role of Grand Challenges for Algorithm Validation and Reproducibility: Grand challenges have proven to be very successful in helping with the development and validation of novel and innovative algorithms such as brain tumor segmentation [[109]]. They are a good way to crowdsource the annotation of data. This results in an annotated data set which is critical to the advancement of quantitative imaging through deep learning. Grand challenges explicitly encourage open science and open source, best-of-breed algorithms. They do so, by presenting informaticians with specific problems, constraints, and incentives for innovation [[110]]. They also address reproducibility and integration by providing access to clinical and -omics data that are not always readily available and can help move a tool to readiness for the commercialization pipeline. Grand challenges are conducted in specially-designed environments that import specific data sets from selected archives, for training and testing on the algorithms, and provide comparisons of results and performance. However, they need to be organized in a manner that simulates clinical workflows and other real-world constraints (e.g., computational pathology challenges must use whole slide images (WSIs) at 40x magnification or higher depending on the task, or test datasets that are noisy and need cleaning). They also need to be organized in a manner where the submissions are run on the test data by organizers, thereby fostering a culture of tool reproducibility (more details in N5).


#

N4 Representing and Managing Quantitative Image Features

Effective curation of, and integration with, imaging features is critical to the realization of the potential of quantitative imaging. Non-proprietary annotation and markup will allow for cross-hardware compatibility, interoperability, and sharing of data from many sources and between institutions

Feature Representation: Integrating radiology, pathology, clinical, and -omics data requires that image annotations be stored in a standardized and interoperable manner. One example of image annotations is the segmentation of the image regions corresponding to the tissues or objects of interest. Such annotations can be displayed during image viewing, can be used to extract quantitative measures from the image (e.g., tumor volume, vessel permeability), and can capture aspects of key regions within images that are meaningful to the radiologist and oncologist. For example, image annotations can record the location and measurements of target lesions or point out non-target lesions. Frequently, the annotations, created on commercial image viewing workstations, are collected and stored in either proprietary formats or as DICOM presentation state objects, which are like graphical overlay objects. This enables rendering the information visually, but does not support search of, and access to the annotations, nor any computation on them. One is therefore forced to rely on vendor-specific implementations and software. Even if one were to use vendor-specific software, these software tools are often closed, and do not adopt standards for annotation, thus hindering interoperability. Consequently, all annotations currently must be created and maintained within siloed commercial applications, and there is no interoperability of image annotations across platforms and applications. To realize the potential value of integrative radiology-pathology-omics, it is vital that image annotations be stored in standardized interoperable formats such as the Annotation and Image Markup (AIM) standard or DICOM; a harmonization effort is underway to unify these two standards.

The goal of the AIM project [[111]] is to provide a standardized, interoperable mechanism for modeling, capturing, and serializing image annotation and markup data that would be adopted widely within the medical imaging community. Both human- and machine-readable artifacts are possible. The variability in methods of storing annotations with the image data is a concern that can be addressed by developing standard DICOM objects to store this information. DICOM Working Group 8 is working to harmonize and unify the AIM and DICOM standards and create a DICOM Structured Reporting object to store AIM image annotations. When adopted by commercial platforms, this will provide a standardized interoperable format for image annotations. Adopting this as the standard format to store image annotations will streamline software development and enable the work to focus on providing rich annotation features and functionality and on amassing a large collection of minable image data. Designing the tools to be compatible with other standards will enable a high degree of interoperability and the incorporation of the annotation standard into commercial, clinical, information systems.

Data Visualization: Image viewing platforms that support AIM/DICOM-SR will permit consuming annotations from a variety of sources and linking them to other types of image data as well as non-image data. Moreover, large collections of image data will become “minable” to enable discovery from historical collections of Radiology/ Pathology image data. Such activities will be particularly important in cooperative groups, who routinely collect and store large amounts of such image data and annotations during clinical trials.


#
#

N5 Scaling Quantitative Imaging via Container and Cloud Deployments

Novel container technologies will allow for portability and interoperability, critical to sharing algorithms in a distributed research environment. Increasing adoption of cloud environments will allow researchers to compute and process at significantly larger scales

Advances in systems software such as containers provide the ability to encapsulate algorithms, and their implementations, thus enhancing reuse and portability [[89], [112]]. Containers, popularized by Docker, make it possible for researchers to share their algorithms and pipelines in a robust and self-contained fashion. These systems integrate nicely with modern distributed version control system, thereby greatly simplifying the deployment of data processing codes. Additionally, in instances where investigators are unable to share source code, containers give them the ability to create images that are equivalent to platform-agnostic, binary executables of their data processing codes.

In recent years, cloud computing has become much more popular within the research community. This increased interest has been spurred, in part by the launch of the NCI supported, genomics cloud pilots [[113]]. These cloud pilots are now serving as exemplars that allow researchers to perform genomic studies on the cloud, without having to first download large quantities of genomic data and then upload them to institutional clusters for processing and analysis. The adoption of containers ease this migration by greatly simplifying the complexity of deploying diverse code-bases on a single cloud [[91], [114]].

The imaging community should consider these technologies as a means of sharing methods and tools. Some key issues in this area are still to be addressed. The cost of processing on the cloud is still high, though this is being addressed through the recent launch of the NCI Commons Credit Pilot [[115]].


#
#

Open Standards and Open Source Architecture

These enable flexible and more rapid technology developments, which are reproducible and are more likely to see an accelerated adoption in the marketplace

Open source refers to software that is accompanied by its source code and is made available through a license which allows users to change and re-distribute the software under the conditions stipulated by the license. Different flavors of open source licenses exist [[112]]. Examples include GNU Public License (GPL) that limits commercial use of the source code, and MIT or FreeBSD licenses that do not limit modification and reuse of the source code by anyone and for any purpose, including commercialization. There are numerous examples of software tools developed within NCI-supported programs that are being made available as open source. One example is ePAD [[113], [115]], a quantitative imaging informatics platform that provides web-based access to AIM-compliant metadata and semantic image annotation on any platform and any image workstation. Another open source solution is LIBRA [[95], [116]], a software package developed at the University of Pennsylvania that is a fully-automated breast density estimation solution based on a published algorithm that works on both raw and vendor post-processed digital mammography images. The DICOM Toolkit (DCMTK) is another example of openly available software [[117]]. DCMTK is a collection of libraries and applications that implement large components of the DICOM standard, including software for examining, constructing, and converting DICOM image files, handling offline media, and sending and receiving images over a network connection.

For the developer of quantitative imaging algorithms, whether for data collection or image analysis, the use of open source software as modules or components in the total algorithm package can be a shortcut to success. Open source development has seen a significant growth and transformation with the release of git [[118]] (a distributed version control systems) and github.com (a publically accessible, centralized, hosted git service). A commitment on the part of the developer to use a modular, open-source architecture, encourage reuse, thereby introducing efficiencies in algorithm development is required. A significant development is the widespread use of containerization platforms such as Docker [[119], [120]] and related projects, which are enabling more broad dissemination of methods through facile packaging and execution of algorithms. In other words, the inherent flexibility in open source programming permits the programmer to focus on building custom interfaces, to create new capabilities, and to customize the performance of the overall algorithm. It also allows for parallel development on independent components. Importantly, open source development is critical for community building and a continuity of the development that might be more tolerant to the interruptions in funding or fluctuations in the personnel at individual academic labs.

Innovation is important to science, but we also need to balance that innovation with pragmatism, developing what researchers need today and what can facilitate progress in steps. We can learn from the success of communities such as DICOM and The Biomedical Research Integrated Domain Group (BRIDG) [[121]] to make sure what we develop resonates with research communities. If we do not take this approach, reproducibility of research results and outputs, which is critical to scientific research, will never be a reality. Additionally, there is an urgency to demand and reward the sharing of both imaging data and data analysis results to enable secondary analysis, support reproducibility of findings, and to allow aggregation of standardized datasets. These datasets can include radiological images, digital pathology, immunohistochemistry, and data from other modalities that can be standardized and integrated for analysis. Efforts such as the Informatics Technology for Cancer Research (ITCR) Program [[101]], which is funding the development of open source tools and algorithms, have been very successful in generating interest and engagement with the imaging community. Over a dozen tools that support visualization, storage, and analysis have been developed from ITCR funding.


#
#

Bringing Quantitative Imaging into the Clinical Workflow

For quantitative imaging to become a part of precision medicine, it is critical that images and features connect with other diagnostic approaches. Genomics, for example, is receiving a great deal of scientific focus for its ability to chart the progression of disease and to unlock the molecular basis for cancer. Radiomics and radiogenomics are creating a culture change in imaging and in the use of informatics to predict patient outcome. Showing the benefit in combining genomic information with subtle imaging results to gain greater insight into cancer progression is important to speed adoption of imaging methods and incorporate them into the clinical workflow.

Educating clinicians on the benefits of imaging methods in clinical practice is key to their adoption. For example, morphologic diagnosis is required in many cases, as genomic analysis alone is sometimes inadequate. Genomic analysis will not reveal carcinoma versus benign growth and mutations analyses alone cannot provide a specific diagnosis. For example, in the case of Leiomyoma (benign disease) vs. Leiomyosarcoma (cancer), the genetic mutation is the same, but human cognition and the use of microscopes are required to accurately diagnose cancer versus benign growth, although novel deep learning techniques may be helpful in aiding differential diagnosis in the future. Spatial phenotypic heterogeneity is not captured by genomic data. There is no way of understanding interactions between the various cell types in a tumor microenvironment (TME). If the cell composition is the same, but the interactions are different, in two different TMEs, genomics cannot tell them apart. Hence, the study of images, and their spatial data, is crucial.

A Case Study - The Open Health Imaging Foundation at Dana Farber / Harvard Cancer Center: One of the fundamental drivers for an integrated cancer imaging informatics infrastructure is the ability to easily view and share images across sites and modalities, and provide a standards-based platform and plug-in architecture for developers. There are many proprietary commercial web-viewers on the market, which are not easily customized or open to collaborative development, and those systems that are open are not typically of a professional grade that would allow translation and collaboration between academics and industry.

The Tumor Imaging Metrics Core (TIMC) at Harvard created the Open Health Imaging Foundation (OHIF; http://ohif.org). OHIF supports open-source, web-based, imaging technologies, and is building a vendor-neutral, open source, extensible, zero-footprint web-viewer and supporting server for display and analysis of DICOM images. The platform is designed with a plug-in architecture to allow the group to integrate this web-viewer with oncology applications across the cancer research community.

One use case of this zero-footprint web-viewer is the replacement of the group’s existing thick-client with an open source image workstation from the Precision Imaging Metrics [[116]] clinical trials management system. The system was developed by the Dana Farber / Harvard Cancer Center (DF/ HCC) TIMC and is presently in use across six NCI-designated Cancer Centers. To make the system broadly available to the oncology research community, the team is developing an interface to the TIMC’s Precision Imaging Metrics web-based application, and implementing an annotation and overlay standards-compatible interface. The group has been actively working with investigators from several other NCI-funded projects to integrate their viewer with other oncology research platforms. The viewer will meet all the basic requirements for radiology tumor measurements specific to the needs of oncology clinical trials, yet also be flexible enough to be configured for user preferences and extended via plug-ins to support varied research workflows as a shared research resource. To achieve these design goals, the viewer and all its functionality will be delivered to client machines exclusively through the web browser requiring nothing to install on client computers or mobile devices, which greatly simplifies and reduces the cost and support requirements of software deployments, and increases accessibility. The proposed viewer will enable researchers, imaging software developers, clinicians, and patients to access oncology clinical trials images in a freely available and openly extensible environment. This will facilitate remote image viewing and collaborative image consultations among a wide-range of imaging professionals.


#

On-Going and Future Initiatives

There are numerous ongoing initiatives that help advance the integration of imaging into the clinical and research workflow. We summarize a few of these that cover a cross-section of research and clinical use cases, ranging from those that enable imaging-based epidemiologic studies, to others that advance the quality and reproducibility of imaging algorithms, to a few that enable the management and processing of imaging data at large scales. This is by no means a comprehensive list, rather a sampling of informatics projects with a shared theme; one that includes a focus on quantitative imaging, informatics methodologies, and a specific facet of precision medicine.

Development of a Cancer Imaging Commons: The Blue Ribbon Panel working as part of the Vice President’s Cancer Initiative (the Moonshot) made several recommendations [[122]], including the creation of a National Cancer Data Ecosystem. This ecosystem will comprise several commons, like the Genomic Data Commons [[123]], and include an Imaging Commons. The concept of a commons includes the data, compute, and analytical tools residing in one place, presumably in the cloud, for easy access and computation by researchers. The task of getting TCIA data into the CGC Pilots is a first step towards the development of a Cancer Imaging Data Commons.

Virtual Tissue Repository (VTR): The NCI Surveillance, Epidemiology, and End Results (SEER) program is working with participating registries to create the VTR, which will allow researchers to select cases and request that the tumor registries gather tissue, slides, and images, and generate Pathology imaging features and/or additional information. A pilot VTR is using caMicroscope [[124]] for online viewing of digital pathology images and will employ ITCR tools to carry out analyses on Pathology imaging features along with an integrative query system.

Prototype Data Harmonization and Integration Project: Using an NCI Early Detection Research Network (EDRN) breast cancer study containing clinical trial data, in-vivo images, pathology images, and biomarker images, the aim is to build the informatics connections between imaging and clinical data based on ISO standards.

Imaging and Cloud Computing: The Cancer Genomics Cloud (CGC) Pilots, funded by NCI, has been launched in 2016. The three platforms provide access to genomic data in combination with clinical data from The Cancer Genome Atlas. In 2017, these cloud pilots expanded their scope by incorporating proteomic data and imaging data to allow for cross-domain analysis. Such collocation of data simplifies data access (N1) and facilitates an integrated exploration of data (N3). CGCs all rely on containerized applications (N5 & N6). Thus, they meet many of the informatics needs that are outlined here. Recent NCI initiatives, encouraging the use of these resources in research activities, could provide a real-world assessment of the informatics needs that were identified in this paper, and help develop a road map for the advancement of quantitative imaging in clinical and research settings.


#

Conclusion

This article provides a survey of the role and priorities for imaging informatics to help advance quantitative imaging in the era of precision medicine. It came about from a series of workshops, and dialogues between NCI staff and the academic and industrial scientists involved with imaging informatics. The community continues its work through various initiatives, and ongoing dialogues on the subject, and working to translate informatics developments to clinical utility as rapidly as possible. In addition to the six needs and challenges outlined above, there are some other broad recommendations, listed below:

  • Ensure buy-in from clinicians and make sure the tools developed will work in the clinical workflow. Educate clinicians on the value of imaging and its potential contribution to diagnosis, guiding treatment plans, and scientific research.

  • Incentivize and reward sharing of both the imaging data and the data analysis results to enable secondary analysis, support reproducibility of findings, and to allow aggregation of standardized datasets.

  • Create solutions that ensure data quality and veracity, for ease of retrieval and clinical utility.

  • Work towards a flexible, extensible, integrated framework but not a single, monolithic platform; encourage APIs, agile data management systems, and use of standards and semantics for interoperability.

  • Work within organizations to educate tech transfer and legal departments on the importance of industry partners and to set reasonable expectations for such partnerships.


#
#

Acknowledgements

The authors would like to acknowledge the participants of the various workshops, the community of informaticians, medical imaging researchers, and clinical collaborators who are helping advance the goal of quantitative imaging in precision medicine. We would specially like to acknowledge the late Dr. Larry Clarke, who provided the original impetus for this, and Eve Shalley, for her efforts to capture, consolidate, edit, and wrangle the original whitepaper into existence.


Correspondence to:

Ashish Sharma
Woodruff Memorial Research Building
101 Woodruff Circle, #4105
Atlanta, GA 30322, USA
Phone: +1 (404) 654 0124