Adaptive Semantic Tag Mining from Heterogeneous Clinical Research Texts

T. Hao; C. Weng

doi:10.3414/ME13-01-0130

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2015; 54(02): 164-170
DOI: 10.3414/ME13-01-0130

Original Articles

Schattauer GmbH

Adaptive Semantic Tag Mining from Heterogeneous Clinical Research Texts^[*]

Authors

T. Hao

¹Department of Biomedical Informatics, Columbia University, New York, NY, USA

²Key Laboratory of Language Engineering and Computing of Guangdong Province, Guangdong University of Foreign Studies, Guangzhou, China
C. Weng

¹Department of Biomedical Informatics, Columbia University, New York, NY, USA

Further Information

Publication History

received: 29 November 2013

accepted: 15 September 2014

Publication Date:
22 January 2018 (online)

Permissions and Reprints

Summary

Objectives: To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts.

Methods: We develop a “plug-n-play” framework that integrates replaceable un-supervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach’s recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach’s adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts.

Results: Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the baseline ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed.

Conclusions: This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.

Keywords

Medical informatics - text mining - clinical trials - semantic tags - component-based architecture

^* Supplementary material published on our web-site www.methods-online.com

Online Supplementary Material (PDF) (PDF) (opens in new window)

References
1 López-Paz D, Hernández-Lobato JM, Schölkopf B. Semi-Supervised Domain Adaptation with Non-Parametric Copulas, in NIPS. Bartlett PL. et al. (eds.) 2012: 674-682.

Search in Google Scholar
Download RIS citation
2 Tarvainen P. Adaptability Evaluation of Software Architectures; A Case Study. In. 31st Annual International Computer Software and Applications Conference. 2007

Search in Google Scholar
Download RIS citation
3 Benveniste A, Metivier M, Priouret P. Adaptive Algorithms and Stochastic Approximations. Springer Publishing Company, Incorporated; 2012: 376.

Search in Google Scholar
Download RIS citation
4 Tsai FSAT.K. Tang WHS, Chan KL. ###Adaptable Services for Novelty Mining. Systems and Service-Oriented Engineering 2010; 1 (02) 17.

Search in Google Scholar
Download RIS citation
5 Xu Q, Quan Y, Yang L, He J. An adaptive algorithm for the determination of the onset and offset of muscle contraction by EMG signal processing. IEEE Trans Neural Syst Rehabil Eng 2013; 21 (01) 65-73.

Crossref PubMed Search in Google Scholar
Download RIS citation
6 Sandin M, Ali A, Hansson K, Månsson O, Andreasson E, Resjö S, Levander F. An adaptive alignment algorithm for quality-controlled label-free LC-MS. Mol Cell Proteomics 2013; 12 (05) 1407-1420.

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Li YR, Shen L, Suter BW. Adaptive inpainting algorithm based on DCT induced wavelet regularization. IEEE Trans Image Process 2013; 22 (02) 752-763.

Crossref PubMed Search in Google Scholar
Download RIS citation
8 Alippi C, Anastasi G, Francesco MD, Roveri M. An Adaptive Sampling Algorithm for Effective Energy Management in Wireless Sensor Networks With Energy-Hungry Sensors. IEEE Transactions on Instrumentation and Measurement 2010; 59 (02) 335-344.

Crossref Search in Google Scholar
Download RIS citation
9 Das S, Mandal A, Mukherjee R. An Adaptive Differential Evolution Algorithm for Global Optimization in Dynamic Environments. IEEE Trans Cybern. 2013

Search in Google Scholar
Download RIS citation
10 Ellis B. Complexity in practice: understanding primary care as a complex adaptive system. Inform Prim Care 2010; 18 (02) 135-140.

PubMed Search in Google Scholar
Download RIS citation
11 Maas AI, Harrison-Felix CL, Menon D, Adelson PD, Balkin T, Bullock R, Engel DC, Gordon W, Orman JL, Lew HL, Robertson C, Temkin N, Valadka A, Verfaellie M, Wainwright M, Wright DW, Schwab K. Common data elements for traumatic brain injury: recommendations from the interagency working group on demographics and clinical assessment. Arch Phys Med Rehabil 2010; 91 (11) 1641-1649.

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Nadkarni PM, Brandt CA. The Common Data Elements for Cancer Research: Remarks on Functions and Structure. Methods Inf Med 2006; 45 (06) 594-601.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
13 Gennari JH, Sklar D, Silva J. Cross-tool communication: from protocol authoring to eligibility determination. Proc AMIA Symp. 2001: 199-203.

Search in Google Scholar
Download RIS citation
14 Lynch DR, Pandolfo M, Schulz JB, Perlman S, Delatycki MB, Payne RM, Shaddy R, Fischbeck KH, Farmer J, Kantor P, Raman SV, Hunegs L, Odenkirchen J, Miller K, Kaufmann P. Common data elements for clinical research in Friedreich’s ataxia. Mov Disord 2013; 28 (02) 190-195.

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Fink E, Kokku PK, Nikiforou S, Hall LO, Goldgof DB, Krischer JP. Selection of patients for clinical trials: an interactive web-based system. Artif Intell Med 2004; 31 (03) 241-254.

Crossref PubMed Search in Google Scholar
Download RIS citation
16 Respondek G, Roeber S, Kretzschmar H, Troakes C, Al-Sarraj S, Gelpi E, Gaig C, Chiu WZ van Swieten JC, Oertel WH, Höglinger GU. Accuracy of the National Institute for Neurological Disorders and Stroke/Society for Progressive Supranuclear Palsy and neuroprotection and natural history in Parkinson plus syndromes criteria for the diagnosis of progressive supranuclear palsy. Mov Disord 2013; 28 (04) 504-509.

Crossref PubMed Search in Google Scholar
Download RIS citation
17 Sim I, Olasov B, Carini S. An ontology of randomized controlled trials for evidence-based practice: content specification and evaluation using the competency decomposition method. J Biomed Inform 2004; 37 (02) 108-119.

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Sim I, Olasov B, Carini S. The Trial Bank system: capturing randomized trials for evidence-based medicine. In. AMIA Annu Symp Proc. 2003

Search in Google Scholar
Download RIS citation
19 Loring DW, Lowenstein DH, Barbaro NM, Fureman BE, Odenkirchen J, Jacobs MP, Austin JK, Dlugos DJ, French JA, Gaillard WD, Hermann BP, Hesdorffer DC, Roper SN, Van Cott AC, Grinnon S, Stout A. Common data elements in epilepsy research: development and implementation of the NINDS epilepsy CDE project. Epilepsia 2011; 52 (06) 1186-1191.

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Maryann Marton TLF, Kisler B, Pathak J, Haendel M, Bretz J, Haas M. Surveying and Navigating the CDE Landscape. CDISC Journal.

Download RIS citation
21 Niland JC. ASPIRE: Agreement on Standardized Protocol Inclusion Requirements for Eligibility. 2008. Available from. http://wiki.hl7.org/images/ 7/7c/March_5,_2008.pdf.

Download RIS citation
22 Stone K. NINDS common data element project: a long-awaited breakthrough in streamlining trials. Ann Neurol 2010; 68 (01) A11-3.

PubMed Search in Google Scholar
Download RIS citation
23 Grinnon ST, Miller K, Marler JR, Lu Y, Stout A, Odenkirchen J, Kunitz S. National Institute of Neurological Disorders and Stroke Common Data Element Project - approach and methods. Clin Trials 2012; 9 (03) 322-329.

PubMed Search in Google Scholar
Download RIS citation
24 Luo Z, Miotto R, Weng C. A human-computer collaborative approach to identifying common data elements in clinical trial eligibility criteria. J Biomed Inform 2013; 46 (01) 33-39.

Crossref PubMed Search in Google Scholar
Download RIS citation
25 Humphreys BL, Lindberg DAB, Schoolman HMG, Barnett O. Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration. J Am Med Inform Assoc 1998; 5 (01) 1-11.

Crossref PubMed Search in Google Scholar
Download RIS citation
26 Miotto R, Weng C. Unsupervised mining of frequent tags for clinical eligibility text indexing. J Biomed Inform 2013; 46 (06) 1145-1151.

Crossref PubMed Search in Google Scholar
Download RIS citation
27 Lee-Smeltzer KH. Finding the needle: controlled vocabularies, resource discovery, and Dublin Core. Library Collections Acquisitions & Technical Services 2000; 24 (02) 205-215.

Crossref Search in Google Scholar
Download RIS citation
28 Miotto R, Jiang S, Weng C. eTACTS: A method for dynamically filtering clinical trial search results. J Biomed Inform 2013; 46 (06) 1060-1067.

Crossref PubMed Search in Google Scholar
Download RIS citation
29 Boland MR, Miotto R, Weng C. Feasibility of Feature-based Indexing, Clustering, and Search of Clinical Trials. A Case Study of Breast Cancer Trials from ClinicalTrials.gov. Methods Inf Med 2013; 52 (05) 382-394.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
30 Boland MR, Miotto R, Weng C. A method for probing disease relatedness using common clinical eligibility criteria. Stud Health Technol Inform 2013; 192: 481-485.

PubMed Search in Google Scholar
Download RIS citation
31 Mougin F, Burgun A, Bodenreider O. Mapping data elements to terminological resources for integrating biomedical data sources. BMC Bioinformatics 2006; 7 (Suppl. 03) Suppl S6.

PubMed Search in Google Scholar
Download RIS citation
32 Korkontzelos I, Mu T, Ananiadou S. ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials. BMC Med Inform Decis Mak 2012; 12 (Suppl. 01) Suppl S3.

Crossref PubMed Search in Google Scholar
Download RIS citation
33 Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 2010; 17 (03) 229-236.

Crossref PubMed Search in Google Scholar
Download RIS citation
34 GitHub Gist. Extracting key phrases with NLTK in Python. Available from. https://gist.github.com/alexbowe/879414.

Download RIS citation
35 Brown PF PVd, Mercer RL, Della Pietra VJ, Lai JC. Class-based n-gram models of natural language. Computational Linguistics 1992; 18 (04) 12.

Search in Google Scholar
Download RIS citation
36 Frantzi K, Ananiadou S, Mima H. Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries 2000; 3 (02) 115-130.

Crossref Search in Google Scholar
Download RIS citation
37 Boland MR, Tu SW, Carini S, Sim I, Weng C. EliXR-TIME: A Temporal Knowledge Representation for Clinical Research Eligibility Criteria. AMIA Summits Transl Sci Proc. 2012: 71-80.

Search in Google Scholar
Download RIS citation
38 Luo Z, Johnson SB, Lai AM, Weng C. Extracting temporal constraints from clinical research eligibility criteria using conditional random fields. AMIA Annu Symp Proc. 2011: 843-852.

Search in Google Scholar
Download RIS citation
39 Hao T, Rusanov A, Weng C. Extracting and Normalizing Temporal Expressions in Clinical Data Requests from Researchers. Lecture Notes in Computer Science 2013; 8040: 10.

Search in Google Scholar
Download RIS citation
40 Pustejovsky J, Ingria R, Sauri R, Castaño J, Littman J, Gaizauskas R, Setzer A, Katz G, Mani I. The Specification Language TimeML. In: The Language of Time: A Reader. Oxford University Press; 2007

Search in Google Scholar
Download RIS citation
41 Branimir B, Rie Kubota A. TimeML-compliant text analysis for temporal reasoning. In: Proceedings of the 19th international joint conference on Artificial intelligence. Edinburgh, Scotland: Morgan Kaufmann Publishers Inc.. 2005: 997-1003.

Search in Google Scholar
Download RIS citation
42 Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med 1998; 37 4-5 394-403.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
43 Fan JW, Friedman C. Semantic classification of biomedical concepts using distributional similarity. Journal of the American Medical Informatics Association 2007; 14 (04) 467-477.

Crossref PubMed Search in Google Scholar
Download RIS citation
44 Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge, New York: Cambridge University Press; 2008. xxi 482.

Search in Google Scholar
Download RIS citation
45 National institute of health. NINDS Common Data Elements. Available from. http://www.commondataelements.ninds.nih.gov/General.aspx# tab=Data_Standards.

Download RIS citation
46 Sun XH. The Relation of Scalability and Execution Time. In: Proceedings of the 10th International Parallel Processing Symposium. IEEE Computer Society; 1996: 457-462.

Search in Google Scholar
Download RIS citation
47 Sun XH. Scalability versus execution time in scalable systems. Journal of Parallel and Distributed Computing 2002; 62 (02) 173-192.

Crossref Search in Google Scholar
Download RIS citation
48 de Marneffe M-C, B.M.a.C.D.M. Generating Typed Dependency Parses from Phrase Structure Parses. LREC. 2006

PubMed Search in Google Scholar
Download RIS citation
49 Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 2010; 17 (05) 507-513.

Crossref PubMed Search in Google Scholar
Download RIS citation
50 UIMA (Unstructured Information Management Architecture). Available from. http://uima.apache. org.

Download RIS citation

Supplementary Material

Online Supplementary Material (PDF) (PDF) (opens in new window)

Related Journals

Subscribe to RSS

Share / Bookmark

Adaptive Semantic Tag Mining from Heterogeneous Clinical Research Texts[*]

Authors

Publication History

Summary

Keywords

References

Adaptive Semantic Tag Mining from Heterogeneous Clinical Research Texts^[*]