CC BY-NC-ND 4.0 · J Lab Physicians 2022; 14(01): 090-098
DOI: 10.1055/s-0041-1734019
How Do I Do It?

How to Analyze the Diagnostic Performance of a New Test? Explained with Illustrations

1   Department of Community Medicine, Dr Baba Saheb Ambedkar Medical College and Hospital, Rohini, Delhi, India
1   Department of Community Medicine, Dr Baba Saheb Ambedkar Medical College and Hospital, Rohini, Delhi, India
2   Lady Hardinge Medical College, Delhi, India
3   Department of Statistics, University of Calcutta, Kolkata, West Bengal, India
› Author Affiliations


Diagnostic tests are pivotal in modern medicine due to their applications in statistical decision-making regarding confirming or ruling out the presence of a disease in patients. In this regard, sensitivity and specificity are two most important and widely utilized components that measure the inherent validity of a diagnostic test for dichotomous outcomes against a gold standard test. Other diagnostic indices like positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio, accuracy of a diagnostic test, and the effect of prevalence on various diagnostic indices have also been discussed. We have tried to present the performance of a classification model at all classification thresholds by reviewing the receiver operating characteristic (ROC) curve and the depiction of the tradeoff between sensitivity and (1–specificity) across a series of cutoff points when the diagnostic test is on a continuous scale. The area under the ROC (AUROC) and comparison of AUROCs of different tests have also been discussed. Reliability of a test is defined in terms of the repeatability of the test such that the test gives consistent results when repeated more than once on the same individual or material, under the same conditions. In this article, we have presented the calculation of kappa coefficient, which is the simplest way of finding the agreement between two observers by calculating the overall percentage of agreement. When the prevalence of disease in the population is low, prospective study becomes increasingly difficult to handle through the conventional design. Hence, we chose to describe three more designs along with the conventional one and presented the sensitivity and specificity calculations for those designs. We tried to offer some guidance in choosing the best possible design among these four designs, depending on a number of factors. The ultimate aim of this article is to provide the basic conceptual framework and interpretation of various diagnostic test indices, ROC analysis, comparison of diagnostic accuracy of different tests, and the reliability of a test so that the clinicians can use it effectively. Several R packages, as mentioned in this article, can prove handy during quantitative synthesis of clinical data related to diagnostic tests.

Publication History

Article published online:
08 September 2021

© 2021. The Indian Association of Laboratory Physicians. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (

Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India

  • References:

  • 1 Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 2003; 56 (11) 1129-1135
  • 2 Deeks JJ. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ 2001; 323 (7305): 157-162
  • 3 Leeflang MM, Deeks JJ, Takwoingi Y, Macaskill P. Cochrane diagnostic test accuracy reviews. Syst Rev 2013; 2: 82
  • 4 Brenner H, Gefeller O. Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence. Stat Med 1997; 16 (09) 981-991
  • 5 Altman DG, Bland JM. Diagnostic tests. 1: sensitivity and specificity. BMJ 1994; 308 (6943): 1552
  • 6 Fletcher RW, Fletcher SW. eds. Clinical Epidemiology: The Essentials. 4th ed.. Baltimore: MA: Lippincott Williams and Wilkins; 2005
  • 7 The systematic review of studies of diagnostic test accuracy, Joanna Briggs Institute Reviewers' Manual: 2015 edition/Supplement
  • 8 Youden WJ. Index for rating diagnostic tests. Cancer 1950; 3 (01) 32-35
  • 9 Kumar R, Indrayan A. Receiver operating characteristic (ROC) curve for medical researchers. Indian Pediatr 2011; 48 (04) 277-287
  • 10 Sim J, Wright CC. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys Ther 2005; 85 (03) 257-268
  • 11 Mercaldo ND, Lau KF, Zhou XH. Confidence intervals for predictive values with an emphasis to case-control studies. Stat Med 2007; 26 (10) 2170-2183
  • 12 Altman DG, Gore SM, Gardner MJ, Pocock SJ. Statistical guidelines for contributors to medical journals. In: Altman DG, Machin D, Bryant TN, Gardner MJ. eds. Statistics with Confidence: Confidence Intervals and Statistical Guidelines. 2nd ed.. London: BMJ Books; 2000: 171-190
  • 13 Krzanowski WJ, Hand DJ. ROC Curves for Continuous Data (1st ed.). Chapman and Hall: CRC; 2009. Accessed on August 16 at
  • 14 Hanley JA, McNeil BJ. The meaning and use of the area under a ROC curve. Radiology 1982; 143: 27-36
  • 15 Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33 (01) 159-174
  • 16 Obuchowski NA, Zhou XH. Prospective studies of diagnostic test accuracy when disease prevalence is low. Biostatistics 2002; 3 (04) 477-492
  • 17 Begg CB, Greenes RA. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 1983; 39 (01) 207-215
  • 18 Shim SR, Kim SJ, Lee J. Diagnostic test accuracy: application and practice using R software. Epidemiol Health 2019; 41: e2019007
  • 19 Stock C, Hielscher T. Comparison of binary diagnostic tests in a paired study design. Comprehensive R Archive Network website. . Accessed on August 16 at Published February 2014 15.
  • 20 Pepe M. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford: UK: Oxford University Press; 2003
  • 21 Zhou X, Obuchowski N, McClish D. Statistical Methods in Diagnostic Medicine. (Wiley Series in Probability and Statistics; ). 2nd ed.. Hoboken, New Jersey: John Wiley & Sons; 2011
  • 22 Katki HA, Li Y, Edelstein DW, Castle PE. Estimating the agreement and diagnostic accuracy of two diagnostic tests when one test is conducted on only a subsample of specimens. Stat Med 2012; 31 (05) 436-448
  • 23 Begg CB. Biases in the assessment of diagnostic tests. Stat Med 1987; 6 (04) 411-423
  • 24 Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. Getting better but still not good. JAMA 1995; 274 (08) 645-651