Abstract
Objectives Analysis of health care real-world data (RWD) provides an opportunity to observe
the actual patient diagnostic, treatment, and outcome events. However, researchers
should understand the possible limitations of RWD. In particular, the dates in these
data may be shifted from their actual values, which might affect the validity of study
conclusions.
Methods A methodology for detecting the presence of shifted dates in RWD was developed by
considering various approaches to confirm the expected occurrences of medical events,
including unique temporal occurrences as well as recurring seasonal or weekday patterns
in diagnoses or procedures. Diagnosis and procedure data was obtained from 71 U.S.
health care data provider organizations (HCOs), members of the TriNetX global research
network. Synthetic data was generated for various degrees of date shifting corresponding
to the diagnoses and procedures studied, yielding the resulting patterns when various
degrees of shifting (including no shift) were applied. These patterns were compared
with those produced for each HCO to predict the presence and degree of date shifting.
These predictions were compared with statements of date shifting by the originating
HCOs to determine the predictive accuracy of the methods studied.
Results Twenty-eight of the 71 HCOs analyzed were predicted by methodology and confirmed
by their data providers to have shifted data. Likewise, 39 were predicted and confirmed
to not have shifted data. With four HCOs, agreement between predicted and stated date
shifting status was not obtained. The occurrence of routine medical exams, only happening
during weekdays, for these U.S. HCOs was most predictive (0.92 correlation coefficient)
of the presence or absence of date shifting.
Conclusion The presence of date shifting for U.S. HCOs may be reliably detected assessing whether
the routine exams should always occur on weekdays.
Keywords
data quality - electronic health record - secondary use - real-world data