Summary
Objectives: Despite growing enthusiasm surrounding the utility of clinical informatics to improve
cancer outcomes, data availability remains a persistent bottleneck to progress. Difficulty
combining data with protected health information often limits our ability to aggregate
larger more representative datasets for analysis. With the rise of machine learning
techniques that require increasing amounts of clinical data, these barriers have magnified.
Here, we review recent efforts within clinical informatics to address issues related
to safely sharing cancer data.
Methods: We carried out a narrative review of clinical informatics studies related to sharing
protected health data within cancer studies published from 2018-2022, with a focus
on domains such as decentralized analytics, homomorphic encryption, and common data
models.
Results: Clinical informatics studies that investigated cancer data sharing were identified.
A particular focus of the search yielded studies on decentralized analytics, homomorphic
encryption, and common data models. Decentralized analytics has been prototyped across
genomic, imaging, and clinical data with the most advances in diagnostic image analysis.
Homomorphic encryption was most often employed on genomic data and less on imaging
and clinical data. Common data models primarily involve clinical data from the electronic
health record. Although all methods have robust research, there are limited studies
showing wide scale implementation.
Conclusions: Decentralized analytics, homomorphic encryption, and common data models represent
promising solutions to improve cancer data sharing. Promising results thus far have
been limited to smaller settings. Future studies should be focused on evaluating the
scalability and efficacy of these methods across clinical settings of varying resources
and expertise.
Keywords
Data sharing - federated learning - encryption - common data models - cancer