Methods Inf Med 2016; 55(03): 266-275
DOI: 10.3414/ME15-01-0112
Original Articles
Schattauer GmbH

Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text[*]

Tianyong Hao
1   Department of Biomedical Informatics, Columbia University, New York, NY, USA
2   Key Lab of Language Engineering and Computing of Guangdong Province, Guangdong University of Foreign Studies, Guangzhou, China
,
Hongfang Liu
3   Department of Health Sciences Research, Rochester, MN, USA
,
Chunhua Weng
1   Department of Biomedical Informatics, Columbia University, New York, NY, USA
› Author Affiliations
Further Information

Publication History

received: 26 August 2015

accepted: 07 February 2016

Publication Date:
08 January 2018 (online)

Preview

Summary

Objectives: To develop an automated method for extracting and structuring numeric lab test comparison statements from text and evaluate the method using clinical trial eligibility criteria text.

Methods: Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and domain knowledge acquired from the Internet, Valx takes seven steps to extract and normalize numeric lab test expressions: 1) text preprocessing, 2) numeric, unit, and comparison operator extraction, 3) variable identification using hybrid knowledge, 4) variable – numeric association, 5) context-based association filtering, 6) measurement unit normalization, and 7) heuristic rule-based comparison statements verification. Our reference standard was the consensus-based annotation among three raters for all comparison statements for two variables, i.e., HbA1c and glucose, identi -fied from all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov.

Results: The precision, recall, and F-measure for structuring HbA1c comparison statements were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type 2 diabetes trials, respectively. The pre -cision, recall, and F-measure for structuring glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials, and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively.

Conclusions: Valx is effective at extracting and structuring free-text lab test comparison statements in clinical trial summaries. Future studies are warranted to test its generaliz-ability beyond eligibility criteria text. The open-source Valx enables its further evaluation and continued improvement among the collaborative scientific community.

* Supplementary material published on our website http://dx.doi.org/10.3414/ME15-01-0112