Introduction: What Are Laboratory Tests for?
Laboratory tests are part of the clinical process rather than a distinct area of care.[1]
[2] Consequently, tests must be considered as a health care intervention.[3]
[4] This holds for laboratory methods in the field of thrombosis and hemostasis as for
any other test in medicine. Consequently, the ultimate aim of laboratory testing is
to improve health outcomes. Improved outcomes benefit patients as well as the caregiver
team, the provider organization, the health care insurance, as well as society as
a whole.[2]
[5] This is an enormous task that clinicians and laboratory staff accept to develop
the future of laboratory medicine. If we neglect one or more of these areas of responsibility,
our work will be questioned by patients, caregivers, health care organizations, and
society.
Optimized laboratory tests can have a significant beneficial effect on clinical decision
making as well as on health care processes. Asymptomatic patients would be screened
effectively, diagnoses made solidly, reliable predictions about future clinical events
could be made, and the effects of treatments are monitored sensitively. How can we
achieve this? As Christopher P. Price and Robert H. Christensen put it in the preface
of their excellent book Evidence-based laboratory medicine: “We have to ask (1) the right clinical question, (2) using the right test, (3) in the right patient, (4) at the right time, (5) with the analysis using the right result, (6) which yields the right interpretation, (7) with the right decision being made, and (8) the right action being taken, (9) at the right cost.”[2] The task of academic laboratory medicine is to generate knowledge to decide what right is.
To date, we have done this job insufficiently. It appears that most laboratory tests
implemented in clinical practice have not been adequately evaluated and that current
biomarker research does often not address unmet clinical needs.[5]
[6] International standards are scarce, and little research has been done on developing
methodology.[7] Hence, researchers are often not aware of the various methodological tools available.
With the present article, I summarize current problems as well as concepts and methodological
tools for evaluating laboratory tests and propose a framework to be used in future
evaluation projects.
Current Problems
The use of biomarkers in clinical practice and scientific inquiry is rapidly increasing
due to new technologies and the potential associated with precision medicine. The
implementation of these biomarkers, defined as the process of putting to use in routine
clinical practice,[8] is often done without appropriate evaluation studies.[7]
[9]
[10]
[11]
[12] Implementing and applying medical tests before adequate evaluation might lead not
only to high health care costs but may also harm patients and healthy individuals
through unnecessary labeling, incorrect diagnoses, delays in starting appropriate
treatment, or hazardous therapy.[13]
[14] An overview of potential risks of applying laboratory tests to patients is given
in [Table 1]. As a consequence, overdiagnosis and overtreatment are regarded as a significant
threat to human health and represent a significant contributor to health care costs.[13] Data from a large number of studies using a variety of study designs suggest that
the incidence of diagnostic error is unacceptably high.[15]
[16]
[17] Recognition of this problem has led to international initiatives such as the “Choosing
Wisely” campaign[18] or the IFCC Task Force on the Impact of Laboratory Medicine on Clinical Management
and Outcomes.[19]
Table 1
Potential risks of applying laboratory tests to patients
Risk
|
Description
|
Direct adverse event
|
Blood drawing might lead to hematoma, syncope, arterial puncture, and/or thrombus
formation
|
False diagnosis
|
False-positive results may result in incorrect diagnoses and application of unnecessary
and/or risky treatments
|
Rejection of correct diagnosis
|
False-negative results may lead to rejection of correct diagnoses and unnecessary
delays in starting appropriate treatment
|
Initiation of additional investigations
|
Uncertainty in interpretation of positive or negative test results may result in further
investigations
|
Withdrawal of treatments
|
False-positive results may lead to suspicion of certain diseases, which constitutes
a contraindication for treatment of other diseases
|
Increased costs
|
False-positive results may lead to additional investigations and/or treatments with
a relevant increase of costs
|
Adverse emotional effects
|
Receiving a test result may have a lasting impact on mental health, increase anxiety,
stress, and/or lead to depression
|
Adverse social effects
|
Results of medical tests may affect relationships and social interactions
|
Adverse cognitive effects
|
Receiving a test result influences patients' beliefs, perceptions and understanding
of their condition that may affect patients' adaptive behavior
|
Adverse behavioral effects
|
Test results alter risk perceptions and anxiety, which may influence patients' behavior,
for example, with regard to adherence with follow-ups, investigations, and treatments
as well as preventive lifestyle
|
Source: Adapted from Nagler.[10]
The following examples illustrate some of the problems. Sensitivity and specificity
are often used as fixed measures of test performance rather than a description of
test behavior under specific circumstances (such as prevalence), and little attention
is paid to issues of validity, variability, applicability, and precision.[7]
[20] Sample size calculations are rarely employed in diagnostic accuracy studies,[21] and a range of methodological shortcomings can result in biased estimates.[22]
[23]
[24]
[25] Reproducibility and consistency of measurements have not been evaluated adequately
in a relevant proportion of laboratory tests.[9]
[10]
[12]
[26] The application of sensitivity, specificity, likelihood ratios, and Bayes' theorem
to diagnostic reasoning has limitations because these measures vary between subgroups
of patients.[27]
[28]
[29] Also, studies examining the various outcomes associated with the use of laboratory
tests are lacking in most cases.[30] In general, the majority of factors known to affect the utility of laboratory measurements
have not been studied adequately, and this is also the case of laboratory tests used
in the area of thrombosis and hemostasis.[10] This is a major challenge for scientists, physicians, policymakers, and funding
agencies.
What are the reasons for this evaluation gap? This knowledge void has most probably
arisen from the absence of a comprehensive conceptual framework and a generally accepted, standardized approach to research. Researchers and clinicians
are often not aware of the methodological tools available. Summarizing the evidence
from existing studies is also difficult because generally accepted criteria for completeness
and methodological quality are only available regarding diagnostic accuracy studies.
Also, current methodological tools have limited applicability to diagnostic problems
where an appropriate reference standard test is absent or to very rare diseases.
Concepts of Evidence-Based Laboratory Medicine
What Does Evidence-Based Laboratory Medicine Mean?
The concept of evidence-based medicine (EBM) was introduced in the 1990s to promote
evidence from clinical research as the primary approach to clinical decision making.[31] EBM was a significant advance because it has provided methodological tools to allow
clinicians to identify relevant studies, critically appraise the literature, and apply
the findings to their clinical context.[32] Yet, EBM emphasizes patient values and preferences as essential contributors to
decision making.[33] Applying EBM to diagnostic processes means asking clinical questions as well as
probabilistic reasoning. Both issues are discussed later, and a comprehensive overview
of evidence-based laboratory medicine is given elsewhere.[3]
Applying Clinical Questions
If laboratory medicine is to support clinical decision making, and if clinical decision
making is to be based on rational arguments rather than intuition, then a precise
clinical question must be raised.[32] The question should capture the clinical problem that arises in the process of care,
and there must also be evidence that the laboratory test can answer the question posed.[3] Any given result from a laboratory test can only have an impact on clinical decision
making if the clinical question is clearly defined upfront. This holds not only for
individual patients and particular clinical problems but also for the evaluation and
implementation of new laboratory tests into clinical practice. Of course, a precise
question is essential, but not a sufficient requirement for a laboratory test to be
useful. [Fig. 1] illustrates the process of decision making using laboratory tests embedded in the
process of care. Broad categories of clinical questions that can be addressed using
laboratory tests are displayed in [Table 2].
Fig. 1 Process of decision making supported by laboratory tests embedded in the process
of care.
Table 2
Main clinical questions to be addressed by laboratory tests in the process of care
Clinical problem
|
Associated question
|
Possible actions
|
Clinical outcomes
|
Diagnostic
|
Does my patient have the disease?
|
Treat, test further, wait
|
Improvement or deterioration
|
Prognostic
|
Will my patient suffer an adverse event?
|
Start or stop treatment and decide on treatment details
|
Improvement or deterioration
|
Monitoring
|
Is the treatment effective and safe?
|
Intensify or reduce treatment
|
Improvement or deterioration
|
Notes: Diagnostic problems also apply to screening programs. Prognostic information
is used in risk stratification. A combination of a diagnostic and a monitoring problem
appears in critically ill patients: the extend of physiologic derangement is looked
for (diagnostic problem) and the treatment is adapted closely (monitoring).
How should we formulate clinical questions appropriately? Drawing from the principles
of EBM in general and with a view on diagnostic accuracy studies, a clinical question should
include (1) the patient population to which the respective laboratory test is applied
(population), (2) the diagnostic test under investigation (intervention), (3) the reference (gold) standard test (comparator), and (4) a measure of diagnostic performance (outcome). This approach is generally described as the PICO method.[34] Of course, this principle of questioning can be adapted to fit studies in other
areas of research as well as decision making for individual patients.
To give an example, I would like to sketch a scenario relating to prognosis. The head of vascular medicine calls the laboratory manager and asks for the implementation
of a new “ThrombClot” device. He heard at a scientific meeting that this laboratory
assay could predict recurrent venous thromboembolism better than the current standard. Based on this summary, the following questions might be raised: In adult patients
with a recent deep vein thrombosis or pulmonary embolism (population), does determination of the “ThrombClot” assay in addition to the established “unprovoked
index event” criterion (intervention) result in a more accurate prediction of recurrent venous thromboembolic events per
100 patient-years (outcome) than using the “unprovoked” criterion alone (comparator)?
The Diagnostic Process
The diagnostic process will be illustrated as an example of probabilistic reasoning.
Following history taking and physical examination, clinicians develop a list of potential
diagnoses.[35] Probabilistic reasoning then seeks to estimate the probability of each diagnosis
and to adjust the probability as new information such as laboratory test results becomes
available.[36] The probability of the presence of a specific disease is called the pre-test probability, and it corresponds to the proportion of patients who suffer from the disease among
all patients with the same presentation.[32] As estimation of the pre-test probability is generally done intuitively, it is prone to error. Intuition is strongly influenced
by recent or previous dramatic events, and clinicians may have limited experience
with certain diseases. The probability of the disease after new information is incorporated
into the assessment is called the post-test probability. The extent to which a laboratory test result changes the probability can be expressed
by likelihood ratios, which are estimated from diagnostic accuracy studies. [Fig. 2] illustrates the diagnostic process from pre- to post-test probabilities and illustrates test and treatment thresholds. If the post-test probability is above a certain threshold, the disease is regarded as sufficiently likely, and
treatment is started. If the post-test probability is below another threshold, the disease is unlikely, and any pre-test interventions
that may have been initiated are stopped. Additional testing will be performed if
the post-test probability is between the test and treatment threshold. Informing clinicians clearly as to how
test results change the probability of a disease would be anticipated to improve health
care processes considerably.
Fig. 2 Changing probabilities as well as test and treatment thresholds in the diagnostic
process.
A Conceptual Framework for the Implementation of Laboratory Tests
A Phased Approach
Building on the work of several others[3]
[5]
[19]
[30]
[37] and existing guidance for diagnostic accuracy studies (STARD,[38] QUADAS[39]) and drawing from our previous study publications, I propose a conceptual framework
to be used in future projects evaluating laboratory tests in thrombosis and hemostasis
as well as in other disciplines ([Fig. 3]). Our proposed phased approach has several essential advantages. In particular,
the completeness and methodological quality of evaluation studies can be defined and
monitored more easily. A full list of arguments for a phased approach is given in
[Table 3]. The following sections describe the components of the phased conceptual framework.
Fig. 3 A conceptual framework for the implementation of laboratory test.
Table 3
Arguments for a phased approach to laboratory test evaluation
Arguments
|
Rational
|
Completeness
|
All test characteristics are assigned to a particular study phase and incomplete evaluation
can be identified easily
|
Structure
|
Essential characteristics are determined first and studies of a particular phase take
results of the previous phase into account
|
Quality
|
Aims and methodological requirements can clearly be defined for every study phase
|
Synthesis
|
Review of the literature for assessing the value of a test will be simplified
|
Standards
|
Scientific societies and authorities can define minimal methodological requirements
for every phase of the evaluation process
|
Costs
|
Evaluation can be stopped early in case of inadequate study results and cost associated
with more expensive later stages will be saved
|
Phase 1: Technical Performance
There is consensus that studies of analytical accuracy and technical performance should precede more elaborate studies.[40] Once a biomarker is identified, an analytical technique shall be selected that can
determine the marker accurately while meeting the application needs.[5]
[10] If this selection and evaluation procedure is done improperly, subsequent diagnostic
accuracy studies might have inaccurate or even biased results. Besides, imprecise
and inaccurate results, as well as inconvenient application characteristics, hamper
the implementation in routine practice.[40]
Analytical sensitivity and analytical specificity are essential characteristics to determine. (Does the test measure what it aims to
measure?) The detection method will be defined as well as methods of standardization and calibration. Important performance characteristics to study in this phase are the reportable range (including limits of detection and quantification), linearity, precision (agreement within and between series), reproducibility (agreement between observers, devices, as well as laboratories), as well as interferences (including intravascular hemolysis, lipemia, and icterus). A different problem is
that analytes might not be consistent if measured with different analytical platforms.
A particular issue of test performance, which is often neglected, is the interobserver reproducibility. Many tests require some degree of interpretation. One example is flow cytometry,
which is used in the hemostasis laboratory to observe platelet function. Poor interobserver
reproducibility severely reduces the value of a diagnostic test, also with regard
to thrombosis and hemostasis.[41]
How shall we design evaluation studies adequately addressing the analytical accuracy
and technical performance of a laboratory test, and how do we select suitable performance
criteria? I argue that two critical criteria must be taken into account to generate
informative results: (1) what is the clinical question to be answered and (2) what
is the adequate experimental design for the specific analytical method? Uniform checklists
considering only the number of samples might grossly overestimate technical performance
obtained in routine laboratory practice. As an example, we aimed to study the consistency
of thromboelastometry measurements, a point-of-care device that is often used in the
perioperative and acute trauma setting. It was unclear whether the measurements are
reproducible among different devices, among different channels of the same device,
and within individuals. Faced with the problem of labile sample material, we used
a distinct study design and data analysis method, thus identifying critical inconsistencies.[14]
Phase 2: Preanalytical Factors
Preanalytical issues are regarded as the most frequent source of error in laboratory
medicine.[1]
[42] The preanalytical phase is consequently the second issue to be studied in evaluation
projects. It covers specimen collection, sample processing (e.g., centrifugation), transportation, as well as storage. Tests conducted in the hemostasis laboratory are particularly susceptible to preanalytical
artifacts due to the activation of platelets, activation factors, as well as inhibitors.[1] A large number of such factors were found to affect hemostasis tests which included
misidentification of samples, traumatic blood drawing, drawing from vascular access
devices, incomplete distribution of anticoagulant within the tube, underfilling of
tubes, vigorous shaking, use of very small and large needles, use of activating collection
containers, use of anticoagulants other than citrate, tourniquet use, pneumatic tube
transport, long processing times, and incorrect centrifugation schemes. Thus, evaluation
projects should adequately address all aspects of sample handling and processing.
Again, the study design and the performance criteria are defined by the analytical
method and the clinical question to be answered. As an example, we aimed to evaluate
a rapid, high-speed centrifugation scheme to be implemented in a routine hemostasis
laboratory, thus promoting efficiency in laboratory automation. An experimental design
in consecutive patients with suspected abnormal hemostasis tests was chosen in order
to study the full range of values obtained in clinical practice.[43]
Phase 3: Biological Variation and Clinical Factors
The results of laboratory tests vary between individuals, and this must be taken into
account for interpretation.[44] The most obvious examples are differences between males and females, newborns and
adults, pregnancy, and over- and under-weight. Some analytes vary according to circadian
rhythms or longer cycles (e.g., menstrual cycle). Also, laboratory test results might
differ in patients taking particular drugs or undergoing major surgery.[45]
[46] If recognized and taken into account, these effects can even improve the accuracy
of laboratory tests. As an example, the age-dependent increase of D-dimer levels has
been implemented in diagnostic algorithms to rule out venous thromboembolism, thus
improving their diagnostic accuracy.[47]
[48]
Phase 4: Interpretability
Interpretation of test results is a critical step in the process of care because it
determines the course of action to be taken ([Fig. 1]). It can be routine in the case of dichotomous outcomes: i.e., the outcome is either
positive (factor is present) or negative (factor is absent). Typical examples are genetic polymorphisms or the presence of viruses and antibodies.
However, most test results are provided on a continuous scale that makes decision
making much more difficult (even dichotomous test results reflect quantitative values
using certain thresholds). At what level should the test result be regarded as normal, implying that the clinical question is to be answered with a no (e.g., the diagnosis is rejected)? And, at which point should the test result be
regarded as abnormal, implying that the clinical question is to be answered with a yes (e.g., the diagnosis is accepted)?
Several different approaches have been used to solve this problem: (1) the reference range approach, (2) the target cohort approach, (3) the diagnostic definition of normal or risk factor definition, and (4) the therapeutic definition of normal.[32] Each of these options are all associated with certain drawbacks.[32] The reference range approach is used the most often, and clinicians and laboratory specialists are generally
familiar with it. Here, the distribution of test results is determined in a cohort
of healthy individuals (e.g., blood donors) and a statistical cutoff on both sides
of the mean or median is defined (2 standard deviations or the interval between the
2.5th and the 97.5th percentile). The main problem with this method is that an abnormal
test result does not automatically mean that a disease is present, and a normal test
result does not always exclude a disease state. As an example, the prothrombin time
(PT) is not only used for the monitoring of vitamin K antagonists but as a screening
tool in patients with suspected bleeding disorders. The results are usually reported
in seconds, percentages (PT ratio; quick percent), or as international normalized
ratio against a reference range established for the respective reagent and coagulometer.
An abnormal PT will not be associated with a bleeding disorder in the majority of
cases, and a bleeding disorder might be present in some patients with normal PT.
Through the use of the target cohort approach, clinicians can differentiate patients with the disease from patients without the
disease in a cohort of patients with similar signs and symptoms. The advantage of
this approach is that it reflects the clinical question. However, studies must be
conducted with adequately powered cohorts of patients with signs and symptoms of the
target indication, which are tested against a reference standard test. Another drawback
of this method is that some tests are used to answer several different clinical questions,
which makes reporting of the test results challenging. A typical example is the platelet
function analyzer (PFA). The diagnostic accuracy for von Willebrand disease was established
in patients with suspected bleeding disorders, and respective cutoffs were established.[49] Among other reasons, the interpretation of test results is difficult because the
PFA does not capture other common bleeding disorders (platelet function disorders).
In the diagnostic definition of normal/risk factor definition, the interpretation of test results is achieved according to the diagnostic or predictive
value at certain thresholds. For example, D-dimer tests to rule out pulmonary embolism
utilizes a cutoff level of 500 µg/L because this corresponds to a high predictive
value of not having the disease (the likelihood ratio is well below 1).[47] D-dimer tests might also be used in the risk assessment for recurrent venous thromboembolism.
The higher the level of D-dimer, the higher the risk for recurrent venous thromboembolism.
This information can even be quantitatively implemented in clinical prediction models.[50] Another example is immunoassays for the diagnosis of heparin-induced thrombocytopenia.
Higher cutoffs are associated with higher (positive) likelihood ratios, which facilitate
clear clinical decisions.[51] Even though the diagnostic definition of normal/risk factor definition approach ensures that clinical decisions are made taking the actual risk of the patients
into account, it does not automatically mean that this is associated with an improvement
in clinical outcomes. The drawback of this approach is that large and well-designed
clinical studies are necessary to obtain the estimates needed. In addition, the definition
may change regularly as new studies come up.
The therapeutic definition of normal is the most intuitive definition of normal. Laboratory values consistent with a patient population that benefits from a certain
treatment are used as a cutoff. The recent example is treatment with intravenous iron
in patients with heart failure. Two randomized controlled trials demonstrated that
intravenous iron is beneficial with regard to clinical outcomes in patients with heart
failure and iron deficiency. To define iron deficiency, a ferritin cutoff level of
100 ng/mL was chosen.[52] The drawbacks of this approach are, however, that the abnormal definition is applicable only to a certain patient population, and it is very costly
to conduct the underlying studies.
Phase 5: Diagnostic or Prognostic Accuracy
Laboratory test results are typically used to substantiate a suspected diagnosis (diagnostic problem) or to inform a risk assessment to decide on treatment characteristics (prognostic problem). Thus, the diagnostic (or prognostic) accuracy is a crucial characteristic of a
laboratory test. Unfortunately, sensitivity and specificity have been regarded as
fixed properties of a test and too little attention is given to how these measures
are generated, the settings and circumstances to which these values apply, and what
these parameters mean for clinical decision making. In this paragraph, I will discuss
the major issues that apply to diagnostic accuracy studies. As prognostic studies
correspond to classic epidemiological studies, the reader is referred to major textbooks
of epidemiology.
Evidence from a large number of studies has made it clear that sensitivity, specificity,
and related measures only describe the behavior of a test under specific circumstances (the circumstances of the evaluation study) and that these circumstances often do
not resemble the situation in clinical practice.[7] Besides, it has become clear that diagnostic accuracy studies with a suboptimal
design can result in biased results.[22]
[23]
[24]
[25]
The selection of patients is an essential characteristic of the study design because this selected population
defines the target population to which the test can be applied. Ideally, the study
population resembles the target population perfectly in terms of patient characteristics
as well as signs and symptoms. Thus, the diagnostic accuracy measures are determined
from a representative range of patients including those that are “slightly” ill (usually
with lower levels of the index test than “seriously” ill patients) as well as patients
with other disorders that exhibit similar signs and symptoms (often with higher levels
of the index test than healthy volunteers). In contrast, previous studies have generally
selected a group of seriously ill patients and a control group of healthy volunteers,
which has resulted in impressive (biased) diagnostic accuracy measures. This can result
in harm to many patients if the test is implemented prematurely. A well-known example
is the screening for prostate-specific antigen as an indication of prostate cancer.[53] In this instance, including selected rather than consecutive patients referred for
the workup of the suspected disorder leads to a spectrum or selection bias. Verifying the workup modalities of patients can help identify a respective risk
of bias in study populations.[7]
The determination of the index test should be done in exactly the same way as done in clinical practice. Overestimation
of the diagnostic accuracy can occur if the conditions are better in the study situation.
Typical examples are duplicate measurements, test modifications, and interpretation
by specially trained investigators. The operators performing the index test must not
be aware of the results of the reference standard test.
The diagnostic accuracy is estimated against a reference standard test. Suboptimal reference standard tests will result in biased estimates of diagnostic
accuracy. The best available method should instead be selected. Less stringent reference standards might lead to misclassification
and reference standard bias.[7]
Partial verification bias can occur if only a population subgroup is tested against the reference standard[54] and differential verification bias if several reference standard tests are used. Sometimes, a panel of experts reviews
patient charts, clinical data, index test results, and treatment course of the patient
and this represents the reference standard. In this situation, the diagnostic accuracy
is often overestimated due to incorporation bias and that the experts consider cases with a typical presentation (whether or not the
diagnosis is true) and atypical cases tend to be neglected.[7] Again, knowledge of the index test result while interpreting the reference standard
test can lead to biased estimates.
Characteristics of flow and timing of the study procedures might additionally introduce biased estimates. Inappropriately
designed studies may result in awareness of the index test results while interpreting
the reference standard (and vice versa), and the detectable presence of the disease
may vary with time. Besides, the natural course of the disease might be affected by
each intervention. This might change the detectable presence of the disease in the
period between the performance of the index test and the performance of the reference
standard. And may cause bias if both tests are interpreted at different time points
during the disease course. Ideally, the index test is compared with the reference
index test at the same time point.
How shall we analyze data from diagnostic accuracy studies? Traditionally, data are arranged in 2 × 2 tables, and sensitivity and specificity
are calculated. This approach is associated with a number of pitfalls and drawbacks,
however. First, 2 × 2 tables neglect inconclusive results, both with the index as
well as the reference standard test. Excluding inconclusive results from the analysis
is an important source of bias. Thus, data should be analyzed according to the intention-to-diagnose approach.[55] Inconclusive results of the index test are rated as negative if the reference standard
is positive and classified as positive if the reference standard is negative. Observations
are excluded if the reference standard test is inconclusive. Second, the post hoc
definition of the index test cutoff often leads to overestimation of the diagnostic
accuracy.[7] Thus, the threshold should already be defined in the study protocol based on the
preliminary studies (or alternatively via a separate training set of observations).
Third, point estimates of diagnostic accuracy may be imprecise and spurious in smaller
studies. A power calculation is, however, rarely done in diagnostic accuracy studies.[21] A power calculation based on realistic assumptions and reporting of confidence intervals
are essential aspects of diagnostic accuracy studies.[21]
[56]
[57] Fourth, sensitivity and specificity as well as (positive and negative) predictive
values are often used as measures of diagnostic accuracy, but it is known that they
are generally not applicable to other patient cohorts because of changes in prevalence
and various patient characteristics.[7]
[20]
[28] Fifth, the use of likelihood ratios and Bayes' theorem to aid diagnostic reasoning
is questionable because these measures vary between subgroups of patients.[27]
[28]
[29] In contrast, multivariate prediction models not only take covariables into account
(representing subgroups of patients), but they can also determine the added value
of a new laboratory test to existing diagnostic pathways.[58]
Phase 6: Utility (Clinical and Health Care Outcomes)
The ultimate means of assessing the utility of a laboratory test is to study its effect
on health outcomes.[3]
[30] What outcomes should we focus on? Most interestingly and obvious for clinicians
are clinical outcomes. Similar to outcome assessment of randomized controlled trials of interventions,
clinical characteristics are mortality and morbidity crucial. Randomized controlled trials assessing a testing-and-treatment strategy against the absence of such a strategy represent the most rigorous of studies. The
particular morbidity measure depends on the individual disease involved. In the case
of deep vein thrombosis, this might be recurrent venous thromboembolism or the presence of severe postthrombotic syndrome according to the Villalta score and major bleeding events. However, a testing-and-treatment strategy may lead to a variety of adverse events,
and testing for all possible events can be difficult.
The assessment of unmet clinical needs is suggested as the first step in the development
and evaluation of new laboratory tests and a respective checklist is available[5] ([Fig. 4]). This approach has the potential to increase value and reduce waste in biomedical
research. As long as development of biomarkers usually evolves from new analytical
technologies and knowledge from basic science, this checklist might be difficult to
implement, however.
Fig. 4 Proposed checklist for the identification of unmet clinical needs (adapted from Monaghan
et al[5] with modifications). The list can be applied by researchers to identify unmet clinical
needs before developing, evaluating, and implementing new laboratory tests.
Patient-reported outcomes such as pain, anxiety, and functioning add valuable additional measures to the assessment
of clinical outcomes. Also, generic and disease-specific questionnaires measuring
the quality of life are available.
Process outcomes measure how the use of the laboratory test affects health care processes. Is the
risk assessment refined? Is the diagnosis obtained more quickly? Are the processes
simplified? Studies investigating these issues are usually performed in clinical practice
and the study design must be highly adapted to the individual research question.
It is a matter of fact that laboratory medicine must be cost-effective to be relevant for patients, caregivers, provider organizations, health care insurances,
and society as a whole. Thus, costs are an important outcome to study in order to assess the utility of a laboratory
test. Conducting a cost-effectiveness study is, however, difficult in the diagnostic
area because a large number of variables must be taken into account.[59] Thus, such an evaluation is rarely performed in laboratory medicine, though it would
be beneficial given the rising costs of health care.
Perspectives
Laboratory testing aims to improve outcomes—not only for patients but also for caregivers,
provider organizations, health care insurances, and society generally. Clinical decision
making must effectively and efficiently be promoted by laboratory testing to achieve
this goal. A broad range of test characteristics must be assessed, and a number of
methodological tools used to demonstrate the utility of a laboratory test in this
process.
To date, laboratory tests are rarely assessed adequately prior to implementation.
Consequently, overdiagnosis and overtreatment is regarded as a major threat to human
health and health care systems. To address this issue, a comprehensive methodological
framework to be used by researchers, clinicians, and decision makers in authorities,
health care insurances, and provider organizations should be developed and provided.
A phased approach—similar to that used in the assessment of new treatment innovations—has
many important advantages, and an outline has been proposed in this article.
An important question is who should be responsible to do the evaluation projects for
individual laboratory tests? One might argue that authorization processes similar
to the approval of new drugs shall be established. This approach needs a clear and
detailed catalogue of requirements, however, which is not available so far. Future
research can be anticipated to identify methodological shortcomings in current evaluation
projects and to develop new methodological tools to address these issues. Specific
issues, such as the absence of a reference standard or appropriate testing of rare
diseases, should also be addressed. We look forward to scientific societies and authorities
supporting this process through the development of definitive requirements and acceptance
criteria for every phase of the evaluation process of various laboratory medicine
tests.