Summary
Objectives: To develop an automated method for extracting and structuring numeric lab test comparison
statements from text and evaluate the method using clinical trial eligibility criteria
text.
Methods: Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and
domain knowledge acquired from the Internet, Valx takes seven steps to extract and
normalize numeric lab test expressions: 1) text preprocessing, 2) numeric, unit, and
comparison operator extraction, 3) variable identification using hybrid knowledge,
4) variable – numeric association, 5) context-based association filtering, 6) measurement
unit normalization, and 7) heuristic rule-based comparison statements verification.
Our reference standard was the consensus-based annotation among three raters for all
comparison statements for two variables, i.e., HbA1c and glucose, identi -fied from
all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov.
Results: The precision, recall, and F-measure for structuring HbA1c comparison statements
were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type
2 diabetes trials, respectively. The pre -cision, recall, and F-measure for structuring
glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials,
and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively.
Conclusions: Valx is effective at extracting and structuring free-text lab test comparison statements
in clinical trial summaries. Future studies are warranted to test its generaliz-ability
beyond eligibility criteria text. The open-source Valx enables its further evaluation
and continued improvement among the collaborative scientific community.
Keywords
Medical informatics - natural language processing - patient selection - clinical trial
- comparison statement