Yearb Med Inform 2017; 26(01): 59-67
DOI: 10.15265/IY-2017-010
Special Section: Learning from Experience: Secondary Use of Patient Data
Working Group Contributions
Georg Thieme Verlag KG Stuttgart

Evaluation Considerations for Secondary Uses of Clinical Data: Principles for an Evidence-based Approach to Policy and Implementation of Secondary Analysis

A Position Paper from the IMIA Technology Assessment & Quality Development in Health Informatics Working Group
P. J. Scott
1  University of Portsmouth, Centre for Healthcare Modelling and Informatics, Portsmouth, United Kingdom
,
M. Rigby
2  Keele University, School of Social Science and Public Policy, Keele, United Kingdom
,
E. Ammenwerth
3  UMIT, University for Health Sciences, Medical Informatics and Technology, Institute of Medical Informatics, Hall in Tyrol, Austria
,
J. Brender McNair
4  Aalborg University, Department of Health Science & Technology, Aalborg, Denmark
,
A. Georgiou
5  Macquarie University, Australian Institute of Health Innovation, Sydney, Australia
,
H. Hyppönen
6  National Institute for Health and Welfare, Information Department, Helsinki, Finland
,
N. de Keizer
7  Academic Medical Center, Department of Medical Informatics, Amsterdam, The Netherlands
,
F. Magrabi
5  Macquarie University, Australian Institute of Health Innovation, Sydney, Australia
,
P. Nykänen
8  University of Tampere, School of Information Sciences, Tampere, Finland
,
W. T. Gude
7  Academic Medical Center, Department of Medical Informatics, Amsterdam, The Netherlands
,
W. Hackl
3  UMIT, University for Health Sciences, Medical Informatics and Technology, Institute of Medical Informatics, Hall in Tyrol, Austria
› Author Affiliations
Further Information

Correspondence to:

Dr Philip J. Scott
Centre for Healthcare Modelling and Informatics
University of Portsmouth
Buckingham Building
Lion Terrace
Portsmouth PO1 3HE
United Kingdom
Phone: +44 23 9284 6378   

Publication History

Publication Date:
11 September 2017 (online)

 

Summary

Objectives: To set the scientific context and then suggest principles for an evidence-based approach to secondary uses of clinical data, covering both evaluation of the secondary uses of data and evaluation of health systems and services based upon secondary uses of data.

Method: Working Group review of selected literature and policy approaches.

Results: We present important considerations in the evaluation of secondary uses of clinical data from the angles of governance and trust, theory, semantics, and policy. We make the case for a multi-level and multi-factorial approach to the evaluation of secondary uses of clinical data and describe a methodological framework for best practice. We emphasise the importance of evaluating the governance of secondary uses of health data in maintaining trust, which is essential for such uses. We also offer examples of the re-use of routine health data to demonstrate how it can support evaluation of clinical performance and optimize health IT system design.

Conclusions: Great expectations are resting upon “Big Data” and innovative analytics. However, to build and maintain public trust, improve data reliability, and assure the validity of analytic inferences, there must be independent and transparent evaluation. A mature and evidence-based approach needs not merely data science, but must be guided by the broader concerns of applied health informatics.


#

Introduction

In this contribution, the IMIA working group on Technology Assessment & Quality Development in Health Informatics offers some evaluation considerations for secondary uses of clinical data. In setting out important principles for an evidence-based approach to policy and implementation, we identify two quite distinct conceptual categories of concern: first, evaluation of the secondary uses of data from philosophical, methodological, and ethical perspectives; and second, evaluation of health systems and services based upon secondary uses of data.

‘Secondary use’ needs definition. Simply put, ‘Secondary use of health data applies personal health information for uses outside of direct health care delivery’ [[2]]. Data is recorded for a particular purpose within a healthcare encounter, such as recording presenting problems, tentative diagnosis, and treatment action initiated. Few would argue against using that data to track an antigen batch, to audit quality of delivery, or to deal with a subsequent complaint, even though these were not anticipated at the time of recording. Secondary use is when data recorded for an operational care purpose is used to create new intelligence or knowledge away from its context of origin and without the originators necessarily being aware. Examples include: clinical research, population health, epidemiology, and pharmaceutical effectiveness. Specific lines of research may include policy effectiveness (achieving objectives); treatment outcomes against intent; multi-morbidity patterns; polypharmacy outcomes; changing illness patterns. Though potential sources of data may appear homogeneous, such as hospital records, the components and resultant analyses are very varied and focussed. The potential scope may include laboratory data, pharmacy, radiology, immunisation, emergency/elective attendances, primary care, mental health, social care, payers, public health, bio-surveillance, pharmacovigilance, and incident reporting, while external linkages to enable greater depth of interpretation in a big data modality may include census data, meteorological data, law enforcement data, education data, and housing data.

To set the scene, we highlight the fact that there are different aggregations and resultant analysis of big data sets. In the commercial world, ‘big data’ may be seen as the aggregation of data from disparate unrelated sources. For instance, data might be combined from consumer spending recorded on loyalty cards, weather forecasts, small area socio-economic profiling, and television schedules to forecast product demand for supermarket branches to drive the supply chain. By contrast, most large health data sets are of similar data set types, such as pooled anonymised primary care consultation and prescribing data to look at long term outcomes. Increasingly, however, the health sector focus is on what might be called ‘hybrid’ large data sets, where different health data sets are analysed together. Far more than commercial ‘big data’, ‘hybrid’ health data faces ethical and governance issues and questions of public trust and acceptability. There are both practical and public perception issues about re-use that take very different and operationally unlinked data together in analyses to seek new and unanticipated forms of personal behaviour, illness trajectories, and likely outcomes.

Timely and accurate health data spanning the continuum of care linked at patient level, and safely shared as necessary for care delivery purposes, has been recognized globally as an essential tool. Secondary use is seen as critical not just for the optimal delivery of individual health care interventions, but also for improving performance of health care systems and health outcomes of patients, for obtaining longer-term and real world evaluations of existing treatments including in a multi-morbidity context, for supporting the re-design and evaluation of new models of health care service delivery and for contributing to the discovery and evaluation of new treatments [[1]]. This is the foundation of both Smarter Healthcare [[3], [4]] and Learning Health Systems [[5]].

This means there is a need for an evidence-based approach: balancing innovation and evaluation with trust and governance. This results in the need for an evaluation lifecycle of secondary analysis, with both formative and summative elements [[6], [7]]. This would build trust, support accountability, transparency and regulation; and have as one requirement transparent reporting of the evaluation of secondary use analyses.

This paper starts with the first conceptual category introduced above, evaluation of secondary uses of data. Sections two and three address evaluation issues from the perspectives of governance, trust, theoretical considerations, semantics, and context. Then we consider, in section four, how national and regional policies are framing evaluation factors relevant to secondary uses (both of secondary uses and based upon secondary uses). Section five presents examples of how the analysis based upon secondary uses has informed enhancements in clinical performance and health IT design. Finally, section six synthesises the two categories and demonstrates why a multi-level evaluation approach is needed. We conclude with summary recommendations.


#

2 Governance and Trust

Anxiety about health information confidentiality is an issue for many patients and care professionals, particularly when data are collected digitally and held virtually. Managing safe use of health data is a major concern across the Organisation for Economic Co-operation and Development (OECD) countries, having a direct impact on the sharing of personal health data, and even causing patients to engage in “privacy protective behaviours” (avoiding screening tests, treatment, or be recruited in research protocols). The development and publication of suitable policies or guidelines greatly increases public transparency [[1], [5]].

The possibility of wide secondary use, for reasons and by agencies not known at the time of data collection, and without individual permission at the time of analysis, raises a multiplicity of new concerns about personal confidentiality and about the exploitation of knowledge for unknown purposes. Uncontrolled use for secondary purposes may thus lead to greater anxiety, greater potential for protective behaviours, and thereby also for incomplete and biased data sets [[8]]. Conversely, however, undue restriction on controlled secondary use closes down important research options without society having opportunity to debate this potential non-discovery of new knowledge.

These concerns could be magnified by the push towards “open data” [[9], [10]], even though data is intended to be aggregate and non-identifiable. However, a recent study of 13,000 US biobank participants reported that although 51% expressed worry about privacy, results did not suggest that open data sharing would adversely affect participant recruitment [[11]]. Given that biobank participation is based on explicit consent of some kind, this finding is not necessarily transferable to routine secondary use of “open data” from the general population. Indeed, a recent survey of over 20,000 citizens from across the European Union (EU) found a strong preference for not sharing anonymised health data with academic researchers [[12]].

A particular privacy concern relates to re-identification of “anonymised” data. Privacy legislation allows the disclosure of health data for secondary purposes without patient consent if the data are de-identified. De-identification is the act of reducing the information content in data to decrease the probability of discovering an individual’s identity. It has been argued that de-identification methods do not provide sufficient protection because they are easy to reverse and thus data can be easily re-identified. However, a systematic review [[13]] showed that only a few attacks have involved health data and more importantly, most re-identified data has not been de-identified according to existing standards.

To manage the harmful risks of re-identification, future research should focus on re-identification attacks on large databases that have been de-identified following existing standards, and success rates should be correlated with how well de-identification was performed. It is important to collect an evidence-based understanding of the extent to which de-identification standards and practices protect against real re-identification attacks and how the standards and practices should be developed to cover the future challenges. Given that “it can be impossible to assess re-identification risk with absolute certainty” [[14]], this and other citizen concerns demand open and transparent debate. Scotland is an example of a country which has sought open debate on local approaches, and made clear its policies [[15]].

Good governance is therefore an essential prerequisite for ensuring effective primary and secondary use of health IT systems. It provides a framework to create the necessary trust to enable full and willing secondary, or value-adding, use [[1]]. A definition of governance [[16]] specifies monitoring and evaluation as an integral part of any policy, and e-health is no exception. Developing and implementing national e-health strategies calls for monitoring and assessing their progress towards availability, usability, quality, and integrity of the data, and its safe sharing ability and transparency – data governance [[17]]. Evaluation is essential to identify good practices from which others could learn to support the movement toward common best practices [[1]].

The OECD has a long-standing interest and expertise in this area and has published eight key data governance mechanisms to support strengthening national health information systems and enabling multi-country projects to improve the population health ([Box 1]). Each of these calls for some kind of assessment and evaluation.

Provision of good quality personal health data is a prerequisite for extracting good quality statistics for secondary use purposes, but it is merely a beginning. There is still much work needed to develop criteria and assess maturity of individual countries’ data governance systems related to secondary use of health data. Ensuring trust through conspicuous and transparent governance frameworks is an essential prerequisite. Sound examples exist, and continued evaluation is necessary to refine best and most effective practice.

Box 1 OECD key data governance mechanisms

  1. The health information system supports the monitoring and improvement of health care quality and system performance, as well as research innovations for better health care and outcomes. (There are indicators in the OECD e-health model survey for organisations to measure attainment of this principle.)

  2. The processing and the secondary use of data for public health, research, and statistical purposes are permitted, subject to safeguards specified in the legislative framework for data protection. (This principle calls for evaluation of national policies and legislation.)

  3. The public is consulted upon and informed about the collection and processing of personal health data. (Existence of this mechanism calls for policy analysis; public awareness can be monitored by citizen surveys.)

  4. A certification/accreditation process for the processing of health data for research and statistics is implemented.

  5. The project approval process is fair and transparent, and decision-making is supported by an independent, multidisciplinary, project review body.

  6. Best practices in data de-identification are applied to protect patient data privacy.

  7. Best practices in data security and management are applied to reduce re-identification and breach risks.

  8. Governance mechanisms are periodically reviewed at an international level to maximise societal benefits and minimise societal risks as new data sources and new technologies are introduced (see [[1]] section 5.1).


#

3 Theoretical Considerations, Semantics, and Context

Secondary use of clinical data carries several fairly obvious assumptions, all of which are fundamental to inferential statistics, but the limitations of which are not always acknowledged in the way health data are used or abused. Firstly, it is frequently believed that it is theoretically valid to re-use data outside their context of origination and that meaning can be safely asserted independently of that context. Secondly, it must be assumed that operational clinical data are of sufficient minimum quality to be reliable and usable (albeit with various “data cleansing” procedures required). Thirdly, it is held that sound population-level inferences can be drawn from such secondary use of data. In this section, we examine these assumptions.

Contextual Validity of Data

As Ingenerf nicely summarises, in health informatics, the problem of providing meaning to data communicated and then processed is the issue of semantic interoperability [[18]]: when communicating, healthcare professionals are used to interact dynamically at a syntactic and semantic levels until they have a common understanding. When dealing with a patient case, a physician creates and tests a mental image while interpreting data into information, based on his entire professional context, and it is within this context that he/ she communicates. The risk of electronic data transfer is to lose the context by ‘lifting the ink off the paper’. Thus, the challenge is to ensure that context is faithfully carried with the data and information transferred. The physical reality, the clinician’s mental model, and the information model embodied in the electronic health record (EHR) or data exchange may be three or even four quite different things [[19], [20]].

In 1991, Johan van der Lei spoke and wrote adamantly against the misuse of data in computer-stored medical records and formulated the First Law of Medical Informatics: “Data shall be used only for the purpose for which they were collected” [[21]] with the explicit consequence that “If no purpose was defined prior to the collection of the data, then the data should not be used.” Van der Lei gave two major reasons for this: a) the quality of the data, and b) the context of the data. Given advances in technology, the challenges to elicit and process such data are now quite tractable, at least in Western countries, and there is an increasing demand to exploit this data. So, we have to look carefully at and beyond the two barriers mentioned by van der Lei, with a focus on the fundamental principles applying to re-use of data and information. Perhaps it is time to re-formulate the First Law with words such as “Usage of data for purposes other than those for which they were generated is acceptable only when this has been validated stringently according to both ethical and scientific principles, including faithful reflection of context.” The ethical validation should include consideration of the potential socio-economic benefit, but also that patient concerns about data misuse can contribute to “censored” EHR content [[8]].


#

Assumptions about Data Quality and Provenance

There is a general organisational context and a specific clinical context of stored clinical data. Transferability is a very real and serious issue, for at least the following two reasons: analytical variation and biological variation. For example, even if a laboratory test has the same name in two clinical locations, the analytical methods may not be identical, and hence the data generated may vary significantly. The problem of transferability was investigated by many research groups in the 1980s and 1990s. For instance, the impact of various factors, when considered individually, on the validity of the outcome of decision support systems (e.g. technology, methodology, and terminology factors) was clearly demonstrated [[22]]. For instance, the study showed that even for international standard clinical protocols there are differences in the local interpretation of the meaning of individual clinical signs and symptoms. There can be variation in the nosology: the state of knowledge with regard to the investigation or classification of the clinical problem(s) under examination, co-morbidities, previous clinical history, interventions, and drugs taken. There may even be cultural differences in clinical practice and technologies applied or differences in the common language [[22]]. Our conceptual understanding and interpretation of ‘disease’ changes over time, as does diagnostic capability, treatment and care regimes, technical and pharmaceutical abilities, and political governance. Therefore, a technical solution to the problem of interoperability at a semantic level, for instance in terms of standardised terminologies, is merely a partial solution.


#

Justification of Inferences: Scientific and Technological Advances

Uncritical secondary use of data from medical records based on blind trust in the semantic interoperability of such data is irresponsible. Such unconstrained secondary usage of data could for instance erroneously extrapolate a pharmaceutical drug trial based on a cohort of single illness young to middle-aged men to a very different context such as prescribing the drug to small children, postmenopausal or aged women, or people with co-morbidities. The suitability of the knowledge drawn into the new setting needs to be assessed as to its context and origins to decide if it is applicable to the setting, including what verification, adjustments, or safety parameters are needed.

Epidemiological differences may to some extent be compensated for by normalisation procedures. Terminological differences may be coped with by standardization and mapping, which has required decades of sustained efforts and funding. A recent example was the Observational Medical Outcomes Partnership (OMOP), which developed a common data model to support analysis of heterogeneous data from operational EHRs, adverse incident reports, and financial claims [[23]]. Methodological differences and differences with respect to analytical quality may also sometimes be compensated for by normalisation procedures. Such calculations are feasible when one knows the exact correlations and the valid context for interpretation, but may also be accomplished according to the local reference intervals at the point of the clinical intervention.

Scientific efforts have been made in large European Union Research and Development (EU R&D) projects as well as in smaller national projects to combine and exploit the merge or comparison of clinical data from several databases from various countries. The EU-ADR project used normalisation to score clinical events and the PSIP project used a crude normalisation of laboratory data (relative to a population mean) but did not merge the various databases [[24]].

It is feasible to provide a solution through a structured definition of the necessary amounts of details for each element within the context in order to enable valid usage of data. The means is meta-data, meta-information and meta-knowledge for each individual datum, information, and piece of knowledge applied [[25], [26]]. Such required data, information, and knowledge exist, but they are distributed, and unfortunately constitute a large but necessary overhead at the secondary use processing.

The state of the art in the technological aspects of secondary use of data is progressing rapidly. The astonishing potential of digital technology to offer high-quality, high-volume, routine data to generate a virtuous circle of data-driven quality improvements to both direct patient care and secondary uses to support operational management, public health, and research has stimulated massive investment [[27]]. A promising example is the “Green Button” project, which aims eventually at offering real time EHR cohort analysis to provide decision support for the many cases where gold standard randomized control trial (RCT) evidence is lacking [[28], [29]]. The Patient Centred Outcomes Research Institute (PCORI) promotes the development of methodological standards for research that can enhance the development of evidence –based patient-centred health [[30], [31]]. The approach is founded on a systematic process involving public comment, engagement, and revision. The aim is to promote research that is scientifically sound, meaningful, and patient-centred. This approach parallels many of the developments in precision medicine, which can be defined as prevention and treatment strategies that account for individual variability [[32]]. The applicability of precision medicine has increased dramatically with the development of large-scale biologic databases involving genomics, proteomics, and metabolomics, along with the computational tools for dealing with this data. Major developments in patient centred outcomes research and precision medicine are in turn underpinned by major works to ensure patient consent (e.g., the Data Segmentation for Privacy– DS4P initiative) [[33]] and patient safety monitoring (e.g., work on establishing common formats to allow for the uniform collection and reporting of patient safety data by patient safety organisations) [[34]].

It may be that current developments with machine learning using previously unimaginable levels of computational power and quantities of diverse but linkable data will leapfrog traditional approaches to the issues described here (at least the technical ones) [[35]], but it seems premature to yet regard this as a foregone conclusion or a comprehensive solution.


#
#

4 Evaluation Aspects in National and Regional Policy

OECD Findings

The overall ambition of OECD member states is to better include e-health into their health policies and better align e-health investments to health needs [[11]]. Already in 2012, most OECD countries participating in an OECD study [[36]] reported a national plan or policy to implement EHRs (22 of 25 countries). Most had also begun to implement that plan (n=20) and a majority (n=18) had included some form of secondary use of EHRs within their national plan. The most commonly included secondary uses were public health monitoring and health system performance monitoring (n=15). Half of the countries also indicated that their plans included that physicians could query the data to support treatment decisions. The least commonly reported planned data use was for facilitating or contributing to clinical trials (n= 10). Regular use of EHR data for secondary analyses was already underway, mainly for public health monitoring (n=13) and general research (n=11) [[36]].


#

Key Elements in Evaluation from the Viewpoint of Secondary Use of Health Data

As noted above, an important prerequisite for secondary use of personal health data is the transferability of data, which requires organizational, technical, semantic, and legal interoperability, as well as quality and protection of personal data [[37]–[39]]. As countries develop and implement their e-health strategies, they will need to monitor progress to ensure that these requirements are met and that the e-health efforts are indeed contributing to health policy goals. For example, the EU e-health action plan section on global collaboration [[40]] stated that from 2013 the Commission should enhance its work on data collection and benchmarking activities in health care with relevant national and international bodies to include more specific e-health indicators and assess the impact and economic value of e-health implementation. Close collaboration with the OECD and other actors is required to harmonize e-health indicators, including the OECD work on indicators for availability and usage of e-health [[41]] and the Nordic e-health indicator work [[42]], which has defined some common Nordic indicators also for interoperability, protection and quality of the personal health data – key elements in evaluation from the secondary use viewpoint. From the methodological viewpoint, triangulation of methods is needed to be able to cover all the aspects required by the use of personal health data.


#
#

5 Examples: Using Routine Clinical Data for HIT Evaluation and Quality Indicators

We now turn to practical examples of evaluation based upon secondary uses of data. The increasing uptake of EHRs and other health information systems has made routine collection and analysis of clinical data to evaluate and improve clinical performance an easier and faster undertaking. Furthermore, this provides opportunities to create a fine-grained picture of systems’ effects on quality of care by analysing interaction data that are a by-product of their use [[43]]. This section discusses two examples of how routinely collected data can be used to evaluate clinical performance, and how routine clinical and interaction data can be synergized to study the mechanisms of health information systems in detail and optimize their efficacy.

Re-use of Routinely Collected Clinical Data to Systematically Evaluate Clinical Performance

Health professionals need measures to judge the quality of care they provide in order to identify areas for improvement. Further, due to societal pressure on transparency and accountability, governments, accreditation organizations, patient associations and insurance companies have tremendously increased the amount of quality indicators to be measured. The current number of quality indicators makes their manual calculation impracticable. Besides being time-intensive, causing registration burden and lack of timeliness, manual calculation is also error-prone and can jeopardize the reproducibility, validity, and comparability of quality measure results [[44]]. Therefore, quality indicators should be automatically calculated from routinely collected data from EHRs.

Quality indicators are often compared over time and among health care institutions or care providers to identify outliers, which require quality improvement activities. Results of these benchmarking activities can have large negative consequences for those who underperform in terms of financial restrictions imposed by insurance companies, loss of faith by patients, and loss of motivation by care providers. Aspects such as reproducibility, validity, and comparability of quality indicators are hence of utmost importance. However, these aspects are hampered by the fact that quality indicators are often ambiguously defined in natural language, which impedes their automated computability. Therefore, quality indicators should be formalized before their release and application on routinely collected data from EHRs. The CLIF method developed by Dentler et al. [[44]] transforms quality indicators—which are typically described in unstructured text—into precise queries that can be computed on the basis of routinely collected clinical data. The method includes eight steps to formalize the nominator and denominator of a quality indicator and ensures that the formalizations obtained faithfully represent the meaning of the indicator. During the first step, the clinical concepts such as diagnoses and procedures are extracted from the text describing the quality indicator. These concepts need to be coded by standard terminologies such as SNOMED CT or ICD-9/10 depending on the used national coding system. During the second step, these concepts are bound to concepts in the EHR’s underlying information model. In step three, the temporal aspects (e.g. a procedure should be performed before another procedure) of the indicator are formalized. Step four formalizes numeric criteria (e.g. HbA1c value must be below 53 mmol/mol). In steps five and six, the Boolean criteria (e.g. three codes for Diabetes are combined with OR) are formalized and grouped. Step seven formalizes the exclusion criteria and negations, and in step eight criteria that only aim at the numerator and not to the denominator are identified. The generalizability and reproducibility of CLIF has been positively evaluated [[44], [45]]. Whilst CLIF may not directly solve re-use challenges such as missing data and poor data quality, it can guide implementation of local EHRs with respect to how clinical data items should be collected to increase data quality.


#

Unobtrusive Quantitative Process Evaluations to Optimize Health Information Systems

Formalised quality indicators and guidelines are presented in electronic health information systems such as clinical decision support (CDS) and audit and feedback (A&F) systems. These systems have been moderately successful at ensuring that patients receive improved care, but their effectiveness is highly variable [[46], [47]]. CDS provides clinicians with case-specific advice at the point of care (e.g., alerts or reminders) [[48]], whereas A&F provides population-level performance feedback on quality indicators over a period of time [[46]]. The reasons for their variable effectiveness are unclear because the mechanisms behind A&F’s success or failure are poorly understood [[49]]. This limits the ability to design better interventions [[50]]. The electronic nature of modern A&F systems allows for new possibilities to study the mechanisms of A&F quantitatively and unobtrusively by harnessing data that are routinely captured as a by-product of using the systems in real-life [[43]].

Exploring the mechanism through which interventions bring about change is crucial to understanding both how the effects of the specific intervention occurred and how these effects might be replicated by similar future interventions [[51]]. Coiera [[19]] describes this mechanism as an information value chain that connects the use of a system to health outcomes. The chain begins with a user interacting with a system, and some of these interactions will provide information. Some of this information may cause the user to change her decision, which in turn can change the process of care. Finally, only some process changes affect health outcomes. For example, suppose that a general practitioner prescribing non-selective beta-blockers in a patient with asthma is alerted by a CDS system that this may cause exacerbations (“interaction”). When the general practitioner notices the alert (“information received”) and decides to cancel the prescription (“decision changed”) this will affect the patient’s medication regimen (“care process altered”) and can ultimately reduce the risk of asthma exacerbations and unscheduled hospital admissions (“outcome changed”). Whereas most A&F studies only investigate the relationship between exposure (i.e., inviting health professionals to interact with the system) and care processes or outcomes (stage 4 and 5), electronic health information systems can produce usage logs that allow us to evaluate the relationships between all other stages in the information value chain, often with high fidelity [[43]]. Using measurements from all those stages can provide a more comprehensive picture of the intervention process to help explain the observed variability in its effectiveness. In fact, analysing the number and types of events in each stage may help to identify obstructions in the chain that withhold value from progressing to the subsequent stage, and reveal the determinants for a successful progression. However, we would like to emphasize that we are not arguing that analysing the information value chain makes qualitative process evaluations obsolete. Whereas a quantitative approach will reveal that certain events occurred (e.g., users declining an alert), a qualitative approach is more suitable to explore why these events occurred (e.g., the alert conflicted with patient preferences). Our vision is that quantitative evaluations may discover gaps in the intervention process, which may then be filled in by qualitative work, making them complementary.


#
#

6 The Need for Multi-level Evaluation – Key Evaluative Criteria

Health systems, and their supporting technologies, should continuously learn and improve, as postulated by the Learning Health System approach [[3]], and thus evaluation of the means and processes of secondary use of health data is vital as being essential good practice. Particular foci of evaluation should be: i) the consumers of secondary analysis of health data (e.g., health care managers, policy makers, clinicians, researchers, therapeutics developers, and society as ultimate beneficiary of better services); ii) considerations related to the utilisation of the secondary use of data; and iii) ensuring the validity and quality of the secondary use of clinical data [[52]].

Consumers of the Secondary Analysis of Health Data

Health care is increasing in its complexity – not only is there a growing prevalence of multi-morbidity (neonates surviving with ongoing health conditions; ageing populations with greater hazard of health events) but also increased specialisation of service delivery which can lead to fragmentation. Secondary use of data based on robust data linkage techniques has the potential to improve our understating of the breadth and course of health care delivery [[53]]. But while the secondary use of data continues to expand into a fast growing industry, there are important concerns about whether consumers are sufficiently aware of what is going on. For instance, is there a sufficient public awareness [[2]] of the benefits and challenges associated with secondary use of data?

The utilisation of health system data is a sensitive community topic. Any mistrust or lack of confidence about the way that data is handled could inhibit its application and severely affect its utilisation [[54]]. Major questions about the use of secondary data in health [[2]] continue to revolve around whether patients have the right to audit or place constraints on the use of their data. How does society ensure that the use of secondary data is transparent and is safeguarded? Several countries (e.g., Australia and the United Kingdom) are considering “opt-out” models of data consent which provide patients with right to opt out of their personal information being used for purposes beyond their direct care, but this may well lead to bias, for instance by social group or by health condition. This right is also reversible [[54]]. These issues relate very strongly to public trust, which we described in section 2 of this contribution.

The secondary use of health system data relies upon some key principles including transparency and coordination with all stakeholders [[55]]. It also involves the establishment of mechanisms that can monitor, detect, and report on the application of knowledge derived from secondary use of health data (including any adverse incidents) and help to enhance its impact [[56]].


#

Considerations Related to the Utilisation of Secondary Use of Data

The increasing availability and accessibility of large volumes of data from clinical and non-clinical sources have helped to broaden the scope and utilisation of secondary health system data [[57]]. The technological ability to merge, link, re-use, and exchange data has outpaced the establishment of policies, procedures, and processes that monitor the ethics and legality of secondary use of data [[2]]. Types of data brought into integrational secondary analysis may include:

  • Web and social media data (Twitter, Facebook etc.)

  • Machine to machine data (sensors, vital signs etc.)

  • Biometric data (genetics, medical images, etc.)

  • Human-generated data (e.g., EMRs) [[58]]

These data can be clinical or non-clinical [[57]]. Common clinical repositories may include data from EHRs and disease registries which are used to monitor patient care. These may be linked to administrative records and other non-clinical sources such as data from over-the-counter medications, finance, and other consumer data sources. These various sources and types of data come each with their own nomenclature and definition e.g, de-identified data, anonymised data, reversible anonymised data, etc. [[2]]

The conceptual framework for secondary use of health system data analytics is similar to, and can be based in part on, traditional health informatics processes such as the de-identification and anonymisation of data [[2]]. But there are also some important additional conceptual (architectural) considerations. In most cases, the user interfaces of traditional analytical tools differ from those used by “big data” which involve different informatics skills often requiring the use of open-source tools to address complex issues related to the retrieval, pooling, processing, and warehousing of data. These tools currently lack the support and the user friendliness of traditional analytical packages [[58]].


#

Ensuring the Validity and Quality of Secondary Use of Clinical Data

In the past, large silos of traditional paper records remained dormant and were seldom analysed, which meant they played little to no role to enhance the effectiveness and safety of health care [[57]]. Important methodological considerations to ensure that the product of the secondary use of health data is valid, reliable, and applicable, must involve:

  • Consideration of the quality of data

  • Understanding context to ensure that meanings inferred from the data are not distorted

  • Promoting transparency and governance [[59]]

The discipline of health informatics has been built in large part on optimising key standards and considerations for data quality and data metrics [[60]–[62]]. These include consideration of the:

  • Accuracy of data

  • Data comparability

  • Data completeness

  • Data consistency

  • Data relevance

  • Data usability

  • Data validity

The translation of data from secondary analysis into reliable and applicable knowledge that can be use to enhance the quality of care relies also on the proper and effective choice of study design. Large data sources may enhance the potential for evaluation but they are still dependent on the formulation of robust evaluation questions and topics, as well as the proper study design and the use of appropriate tools to support rigorous measurement and assessment [[63]–[66]].


#

Methodological Frameworks for Secondary Uses

One framework for secondary use is SPIRIT (Systematic Planning of Intelligent Reuse of Integrated Clinical Routine Data), a best-practice framework and procedure model for the systematic planning of intelligent reuse of integrated clinical routine data [[67]]. Unlike other methods that concentrate on the analysis part, such as the KDD process (Knowledge Discovery in Databases) as proposed by Fayyad et. al. in 1996 [[68]] or OLAP (OnLine Analytical Processing) as proposed by Codd in 1993 [[69]], SPIRIT allows a holistic view of secondary use and supports the structured, stepwise planning and conduct of secondary use of clinical data in heterogeneous environments, with a special focus on the objectives of data analysis and supporting reproducibility of data analysis. Its application can and should be evaluated in various ways.

First, after secondary data analysis, project management should evaluate whether the defined goals of secondary data reuse have been fulfilled. Often, we can find a scope creep, i.e. a change in originally intended goals to other or additional goals. This is not bad in itself, but should be made transparent. How can this evaluation be done? One approach is to evaluate whether the generated reports respond to the originally defined goals.

Second, the acceptance by stakeholders should also be evaluated: how do various stakeholders see the information and reports that are derived from secondary data analysis? Do they find them helpful? Do they use them regularly? Do they do a continuous reporting? Are there unexpected or adverse effects of secondary use of clinical data, e.g. changes in processes with the sole aim to optimize reported indicators? This evaluation assesses whether the chosen indicators of secondary data analysis respond to stakeholders’ needs and fulfil defined goals.


#
#

7 Conclusion

In conclusion, it can be postulated that while analysis of “Big Data” is politically sexy and attracts funding, nonetheless it needs serious evaluation and evidence-based thinking. As with all health informatics activities and innovations, to do less – and thus condone imperfect or erroneous outcomes – would be unethical. This should be based not just on data science, but also on broader evidence-based health informatics considerations that are needed to build and underpin trust and ensure feasibility and policy effectiveness [[70]].


#
#

Correspondence to:

Dr Philip J. Scott
Centre for Healthcare Modelling and Informatics
University of Portsmouth
Buckingham Building
Lion Terrace
Portsmouth PO1 3HE
United Kingdom
Phone: +44 23 9284 6378