Appl Clin Inform 2016; 07(04): 1135-1153
DOI: 10.4338/ACI-2016-03-SOA-0035
State of the Art/Best Practice Paper
Schattauer GmbH

Preprocessing structured clinical data for predictive modeling and decision support

A roadmap to tackle the challenges
José Carlos Ferrão
1  Siemens Healthcare, Rua Irmãos Siemens 1, 2720–093 Amadora, Portugal
2  CEG-IST, Centre for Management Studies of Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049–001 Lisbon, Portugal
,
Mónica Duarte Oliveira
2  CEG-IST, Centre for Management Studies of Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049–001 Lisbon, Portugal
,
Filipe Janela
1  Siemens Healthcare, Rua Irmãos Siemens 1, 2720–093 Amadora, Portugal
,
Henrique M. G. Martins
3  Centre for Research and Creativity in Informatics, Hospital Prof. Doutor Fernando Fonseca, IC-19 Venteira, 2720–276 Amadora, Portugal
› Author Affiliations
Funding The authors also acknowledge the support from Fundação para a Ciência e a Tecnologia (grant SFRH/ BDE/51605/2011), Siemens Healthcare and the Centre for Management Studies of Instituto Superior Técnico (CEG-IST, University of Lisbon).
Further Information

Correspondence to:

José Carlos Ferrão
Rua Irmãos Siemens 1
Ed. 3 Piso 3
2720–093 Amadora
Portugal
Phone: (+351) 214 178 668   
Fax: (+351) 214 178 030

Publication History

received: 06 March 2016

accepted: 01 October 2016

Publication Date:
18 December 2017 (online)

 

Summary

Background EHR systems have high potential to improve healthcare delivery and management. Although structured EHR data generates information in machine-readable formats, their use for decision support still poses technical challenges for researchers due to the need to preprocess and convert data into a matrix format. During our research, we observed that clinical informatics literature does not provide guidance for researchers on how to build this matrix while avoiding potential pitfalls.

Objectives This article aims to provide researchers a roadmap of the main technical challenges of preprocessing structured EHR data and possible strategies to overcome them.

Methods Along standard data processing stages – extracting database entries, defining features, processing data, assessing feature values and integrating data elements, within an EDPAI framework –, we identified the main challenges faced by researchers and reflect on how to address those challenges based on lessons learned from our research experience and on best practices from related literature. We highlight the main potential sources of error, present strategies to approach those challenges and discuss implications of these strategies.

Results Following the EDPAI framework, researchers face five key challenges: (1) gathering and integrating data, (2) identifying and handling different feature types, (3) combining features to handle redundancy and granularity, (4) addressing data missingness, and (5) handling multiple feature values. Strategies to address these challenges include: crosschecking identifiers for robust data retrieval and integration; applying clinical knowledge in identifying feature types, in addressing redundancy and granularity, and in accommodating multiple feature values; and investigating missing patterns adequately.

Conclusions This article contributes to literature by providing a roadmap to inform structured EHR data preprocessing. It may advise researchers on potential pitfalls and implications of methodological decisions in handling structured data, so as to avoid biases and help realize the benefits of the secondary use of EHR data.

Citation: Ferrão JC, Oliveira MD, Janela F, Martins HMG. Preprocessing structured clinical data for predictive modeling and decision support – a roadmap to tackle the challenges.


#

 


#

Conflicts of Interest

The authors state that they have no conflicts of interest.


Correspondence to:

José Carlos Ferrão
Rua Irmãos Siemens 1
Ed. 3 Piso 3
2720–093 Amadora
Portugal
Phone: (+351) 214 178 668   
Fax: (+351) 214 178 030