Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text

Tianyong Hao; Hongfang Liu; Chunhua Weng

doi:10.3414/ME15-01-0112

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2016; 55(03): 266-275
DOI: 10.3414/ME15-01-0112

Original Articles

Schattauer GmbH

Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text^[*]

Authors

Tianyong Hao

¹Department of Biomedical Informatics, Columbia University, New York, NY, USA

²Key Lab of Language Engineering and Computing of Guangdong Province, Guangdong University of Foreign Studies, Guangzhou, China
Hongfang Liu

³Department of Health Sciences Research, Rochester, MN, USA
Chunhua Weng

¹Department of Biomedical Informatics, Columbia University, New York, NY, USA

Further Information

Publication History

received: 26 August 2015

accepted: 07 February 2016

Publication Date:
08 January 2018 (online)

Permissions and Reprints

Summary

Objectives: To develop an automated method for extracting and structuring numeric lab test comparison statements from text and evaluate the method using clinical trial eligibility criteria text.

Methods: Leveraging semantic knowledge from the Unified Medical Language System (UMLS) and domain knowledge acquired from the Internet, Valx takes seven steps to extract and normalize numeric lab test expressions: 1) text preprocessing, 2) numeric, unit, and comparison operator extraction, 3) variable identification using hybrid knowledge, 4) variable – numeric association, 5) context-based association filtering, 6) measurement unit normalization, and 7) heuristic rule-based comparison statements verification. Our reference standard was the consensus-based annotation among three raters for all comparison statements for two variables, i.e., HbA1c and glucose, identi -fied from all of Type 1 and Type 2 diabetes trials in ClinicalTrials.gov.

Results: The precision, recall, and F-measure for structuring HbA1c comparison statements were 99.6%, 98.1%, 98.8% for Type 1 diabetes trials, and 98.8%, 96.9%, 97.8% for Type 2 diabetes trials, respectively. The pre -cision, recall, and F-measure for structuring glucose comparison statements were 97.3%, 94.8%, 96.1% for Type 1 diabetes trials, and 92.3%, 92.3%, 92.3% for Type 2 diabetes trials, respectively.

Conclusions: Valx is effective at extracting and structuring free-text lab test comparison statements in clinical trial summaries. Future studies are warranted to test its generaliz-ability beyond eligibility criteria text. The open-source Valx enables its further evaluation and continued improvement among the collaborative scientific community.

Keywords

Medical informatics - natural language processing - patient selection - clinical trial - comparison statement

^* Supplementary material published on our website http://dx.doi.org/10.3414/ME15-01-0112

Online Supplementary Material (PDF) (PDF) (opens in new window)

References
1 Boland MR, Miotto R, Gao J, Weng C. Feasibility of feature-based indexing, clustering, and search of clinical trials. A case study of breast cancer trials from ClinicalTrials.gov. Methods Inf Med 2013; 52 (Suppl. 05) 382-394.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
2 Miotto R, Jiang S, Weng C. eTACTS: a method for dynamically filtering clinical trial search results. J Biomed Inform 2013; 46 (Suppl. 06) 1060-1067.

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Tianyong Hao AR, Weng C. Extracting and Normalizing Temporal Expressions in Clinical Data Requests from Researchers. Lecture Notes in Computer Science 2013; 8040 p 10.

Search in Google Scholar
Download RIS citation
4 Hao T, Rusanov A, Boland MR, Weng C. Clustering clinical trials with similar eligibility criteria features. J Biomed Inform 2014; 52: 112-120.

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Hao T, Weng C. Adaptive semantic tag mining from heterogeneous clinical research texts. Methods Inf Med 2015; 54 (Suppl. 02) 164-170.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
6 Bache R, Taweel A, Miles S, Delaney BC. An eligibility criteria query language for heterogeneous data warehouses. Methods Inf Med 2015; 54 (Suppl. 01) 41-44.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
7 Weng C, Payne PRO, Velez M, Johnson SB, Bakken S. Towards Symbiosis in Knowledge Representation and Natural Language Processing for Structuring Clinical Practice Guidelines. Studies in Health Technology and Informatics 2013; 201: 461-469.

Search in Google Scholar
Download RIS citation
8 Thadani SR, Weng CH, Bigger JT, Ennever JF, Wajngurt D. Electronic Screening Improves Efficiency in Clinical Trial Recruitment. Journal of the American Medical Informatics Association 2009; 16 (Suppl. 06) 869-873.

Crossref PubMed Search in Google Scholar
Download RIS citation
9 Miotto R, Weng C. Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials. Journal of the American Medical Informatics Association. 2015 available online.

PubMed Search in Google Scholar
Download RIS citation
10 Weng C, Li Y, Ryan P, Zhang Y, Liu F, Gao J, Bigger JT, Hripcsak G. A Distribution-based Method for Assessing the Differences between Clinical Trial Target Populations and Patient Populations in Electronic Health Records. Applied Clinical Informatics 2014; 5 (Suppl. 02) 463-479.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
11 He Z, Carini S, Sim I, Weng C. Visual aggregate analysis of eligibility features of clinical trials. Journal of Biomedical Informatics 2015; 54 (Suppl. 00) 241-255.

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Tu MZY, Zong CA. Universal Approach to Translating Numerical and Time Expressions. In: 9th International Workshop on Spoken Language Translation. 2012 pp 209-216.

PubMed Search in Google Scholar
Download RIS citation
13 Lonsdale DW, Tustison C, Parker CG, Embley DW. Assessing clinical trial eligibility with logic expression queries. Data & Knowledge Engineering 2008; 66 (Suppl. 01) 3-17.

Crossref Search in Google Scholar
Download RIS citation
14 Tu SW, Peleg M, Carini S, Bobak M, Ross J, Rubin D, Sim I. A practical method for transforming free-text eligibility criteria into computable criteria. Journal of Biomedical Informatics 2011; 44 (Suppl. 02) 239-250.

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Damen DLK, Hellebaut G, Bulcke TVD. PASTEL: A Semantic Platform for Assisted Clinical Trial Patient Recruitment. In: International Conference on Healthcare Informatics. 2013. pp 269-276.

Search in Google Scholar
Download RIS citation
16 Murata MST, Torisawa K, Iwatate M, Ichii K, Ma Q, Kanamaru T. Sophisticated Text Mining System for Extracting and Visualizing Numerical and Named Entity Information from a Large Number of Documents. In: NTCIR-7 Workshop Meeting. 2008. pp 555-562.

Search in Google Scholar
Download RIS citation
17 US National Institutes of Health.. ClinicalTrials.gov. [cited 2014]. Available from: http://www.clinicaltrials.gov.

Download RIS citation
18 Pustejovsky JIB, Sauri R, Castano J, Littman J, Gaizauskas R. et al. The Specification Language TimeML. In: The Language of Time: A Reader. Oxford University Press; 2005. pp 545-557.

Search in Google Scholar
Download RIS citation
19 Boguraev BARK. TimeML – Compliant Text Analysis for Temporal Reasoning. In: 19th international joint conference on Artificial intelligence. 2005. Morgan Kaufmann Publishers Inc.; pp 997-1003.

Search in Google Scholar
Download RIS citation
20 Pustejovsky JMC, Ingria R, Sauri R, Gaizauskas RJ, Setzer A, Katz G, Radev DR. TimeML: Robust Specification of Event and Temporal Expressions in Text. In: New Directions in Question Answering. AAAI Press; 2003. pp 28-34.

Search in Google Scholar
Download RIS citation
21 National Library of Medicine.. Unified Medical Language System Glossary. [cited 2014]. Available from: http://www.nlm.nih.gov/research/umls/new_users/glossary.html

Download RIS citation
22 Units Conversion. [cited 2014]. Available from: http://www.globalrph.com/conv_si.htm

Download RIS citation
23 Gillett MJ. International Expert Committee report on the role of the A1c assay in the diagnosis of diabetes. Diabetes Care 2009; 32 (Suppl. 07) 1327-1334. Clin Biochem Rev; 2009; 30 (4): 197–200.

Crossref PubMed Search in Google Scholar
Download RIS citation
24 Manning CD PR, Schütze H. Introduction to information retrieval. Cambridge University Press; 2009

Search in Google Scholar
Download RIS citation
25 Parker CG, Embley DW. Generating medical logic modules for clinical trial eligibility criteria. AMIA Annu Symp Proc 2003; p 964.

PubMed Search in Google Scholar
Download RIS citation

Supplementary Material

Online Supplementary Material (PDF) (PDF) (opens in new window)

Related Journals

Subscribe to RSS

Share / Bookmark

Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text[*]

Authors

Publication History

Summary

Keywords

References

Valx: A System for Extracting and Structuring Numeric Lab Test Comparison Statements from Text^[*]