Methods Inf Med 2015; 54(02): 164-170
DOI: 10.3414/ME13-01-0130
Original Articles
Schattauer GmbH

Adaptive Semantic Tag Mining from Heterogeneous Clinical Research Texts[*]

T. Hao
1   Department of Biomedical Informatics, Columbia University, New York, NY, USA
2   Key Laboratory of Language Engineering and Computing of Guangdong Province, Guangdong University of Foreign Studies, Guangzhou, China
,
C. Weng
1   Department of Biomedical Informatics, Columbia University, New York, NY, USA
› Author Affiliations
Further Information

Publication History

received: 29 November 2013

accepted: 15 September 2014

Publication Date:
22 January 2018 (online)

Summary

Objectives: To develop an adaptive approach to mine frequent semantic tags (FSTs) from heterogeneous clinical research texts.

Methods: We develop a “plug-n-play” framework that integrates replaceable un-supervised kernel algorithms with formatting, functional, and utility wrappers for FST mining. Temporal information identification and semantic equivalence detection were two example functional wrappers. We first compared this approach’s recall and efficiency for mining FSTs from ClinicalTrials.gov to that of a recently published tag-mining algorithm. Then we assessed this approach’s adaptability to two other types of clinical research texts: clinical data requests and clinical trial protocols, by comparing the prevalence trends of FSTs across three texts.

Results: Our approach increased the average recall and speed by 12.8% and 47.02% respectively upon the baseline when mining FSTs from ClinicalTrials.gov, and maintained an overlap in relevant FSTs with the baseline ranging between 76.9% and 100% for varying FST frequency thresholds. The FSTs saturated when the data size reached 200 documents. Consistent trends in the prevalence of FST were observed across the three texts as the data size or frequency threshold changed.

Conclusions: This paper contributes an adaptive tag-mining framework that is scalable and adaptable without sacrificing its recall. This component-based architectural design can be potentially generalizable to improve the adaptability of other clinical text mining methods.

* Supplementary material published on our web-site www.methods-online.com


 
  • References

  • 1 López-Paz D, Hernández-Lobato JM, Schölkopf B. Semi-Supervised Domain Adaptation with Non-Parametric Copulas, in NIPS. Bartlett PL. et al. (eds.) 2012: 674-682.
  • 2 Tarvainen P. Adaptability Evaluation of Software Architectures; A Case Study. In. 31st Annual International Computer Software and Applications Conference. 2007
  • 3 Benveniste A, Metivier M, Priouret P. Adaptive Algorithms and Stochastic Approximations. Springer Publishing Company, Incorporated; 2012: 376.
  • 4 Tsai FSAT.K. Tang WHS, Chan KL. ###Adaptable Services for Novelty Mining. Systems and Service-Oriented Engineering 2010; 1 (02) 17.
  • 5 Xu Q, Quan Y, Yang L, He J. An adaptive algorithm for the determination of the onset and offset of muscle contraction by EMG signal processing. IEEE Trans Neural Syst Rehabil Eng 2013; 21 (01) 65-73.
  • 6 Sandin M, Ali A, Hansson K, Månsson O, Andreasson E, Resjö S, Levander F. An adaptive alignment algorithm for quality-controlled label-free LC-MS. Mol Cell Proteomics 2013; 12 (05) 1407-1420.
  • 7 Li YR, Shen L, Suter BW. Adaptive inpainting algorithm based on DCT induced wavelet regularization. IEEE Trans Image Process 2013; 22 (02) 752-763.
  • 8 Alippi C, Anastasi G, Francesco MD, Roveri M. An Adaptive Sampling Algorithm for Effective Energy Management in Wireless Sensor Networks With Energy-Hungry Sensors. IEEE Transactions on Instrumentation and Measurement 2010; 59 (02) 335-344.
  • 9 Das S, Mandal A, Mukherjee R. An Adaptive Differential Evolution Algorithm for Global Optimization in Dynamic Environments. IEEE Trans Cybern. 2013
  • 10 Ellis B. Complexity in practice: understanding primary care as a complex adaptive system. Inform Prim Care 2010; 18 (02) 135-140.
  • 11 Maas AI, Harrison-Felix CL, Menon D, Adelson PD, Balkin T, Bullock R, Engel DC, Gordon W, Orman JL, Lew HL, Robertson C, Temkin N, Valadka A, Verfaellie M, Wainwright M, Wright DW, Schwab K. Common data elements for traumatic brain injury: recommendations from the interagency working group on demographics and clinical assessment. Arch Phys Med Rehabil 2010; 91 (11) 1641-1649.
  • 12 Nadkarni PM, Brandt CA. The Common Data Elements for Cancer Research: Remarks on Functions and Structure. Methods Inf Med 2006; 45 (06) 594-601.
  • 13 Gennari JH, Sklar D, Silva J. Cross-tool communication: from protocol authoring to eligibility determination. Proc AMIA Symp. 2001: 199-203.
  • 14 Lynch DR, Pandolfo M, Schulz JB, Perlman S, Delatycki MB, Payne RM, Shaddy R, Fischbeck KH, Farmer J, Kantor P, Raman SV, Hunegs L, Odenkirchen J, Miller K, Kaufmann P. Common data elements for clinical research in Friedreich’s ataxia. Mov Disord 2013; 28 (02) 190-195.
  • 15 Fink E, Kokku PK, Nikiforou S, Hall LO, Goldgof DB, Krischer JP. Selection of patients for clinical trials: an interactive web-based system. Artif Intell Med 2004; 31 (03) 241-254.
  • 16 Respondek G, Roeber S, Kretzschmar H, Troakes C, Al-Sarraj S, Gelpi E, Gaig C, Chiu WZ van Swieten JC, Oertel WH, Höglinger GU. Accuracy of the National Institute for Neurological Disorders and Stroke/Society for Progressive Supranuclear Palsy and neuroprotection and natural history in Parkinson plus syndromes criteria for the diagnosis of progressive supranuclear palsy. Mov Disord 2013; 28 (04) 504-509.
  • 17 Sim I, Olasov B, Carini S. An ontology of randomized controlled trials for evidence-based practice: content specification and evaluation using the competency decomposition method. J Biomed Inform 2004; 37 (02) 108-119.
  • 18 Sim I, Olasov B, Carini S. The Trial Bank system: capturing randomized trials for evidence-based medicine. In. AMIA Annu Symp Proc. 2003
  • 19 Loring DW, Lowenstein DH, Barbaro NM, Fureman BE, Odenkirchen J, Jacobs MP, Austin JK, Dlugos DJ, French JA, Gaillard WD, Hermann BP, Hesdorffer DC, Roper SN, Van Cott AC, Grinnon S, Stout A. Common data elements in epilepsy research: development and implementation of the NINDS epilepsy CDE project. Epilepsia 2011; 52 (06) 1186-1191.
  • 20 Maryann Marton TLF, Kisler B, Pathak J, Haendel M, Bretz J, Haas M. Surveying and Navigating the CDE Landscape. CDISC Journal.
  • 21 Niland JC. ASPIRE: Agreement on Standardized Protocol Inclusion Requirements for Eligibility. 2008. Available from. http://wiki.hl7.org/images/ 7/7c/March_5,_2008.pdf.
  • 22 Stone K. NINDS common data element project: a long-awaited breakthrough in streamlining trials. Ann Neurol 2010; 68 (01) A11-3.
  • 23 Grinnon ST, Miller K, Marler JR, Lu Y, Stout A, Odenkirchen J, Kunitz S. National Institute of Neurological Disorders and Stroke Common Data Element Project - approach and methods. Clin Trials 2012; 9 (03) 322-329.
  • 24 Luo Z, Miotto R, Weng C. A human-computer collaborative approach to identifying common data elements in clinical trial eligibility criteria. J Biomed Inform 2013; 46 (01) 33-39.
  • 25 Humphreys BL, Lindberg DAB, Schoolman HMG, Barnett O. Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration. J Am Med Inform Assoc 1998; 5 (01) 1-11.
  • 26 Miotto R, Weng C. Unsupervised mining of frequent tags for clinical eligibility text indexing. J Biomed Inform 2013; 46 (06) 1145-1151.
  • 27 Lee-Smeltzer KH. Finding the needle: controlled vocabularies, resource discovery, and Dublin Core. Library Collections Acquisitions & Technical Services 2000; 24 (02) 205-215.
  • 28 Miotto R, Jiang S, Weng C. eTACTS: A method for dynamically filtering clinical trial search results. J Biomed Inform 2013; 46 (06) 1060-1067.
  • 29 Boland MR, Miotto R, Weng C. Feasibility of Feature-based Indexing, Clustering, and Search of Clinical Trials. A Case Study of Breast Cancer Trials from ClinicalTrials.gov. Methods Inf Med 2013; 52 (05) 382-394.
  • 30 Boland MR, Miotto R, Weng C. A method for probing disease relatedness using common clinical eligibility criteria. Stud Health Technol Inform 2013; 192: 481-485.
  • 31 Mougin F, Burgun A, Bodenreider O. Mapping data elements to terminological resources for integrating biomedical data sources. BMC Bioinformatics 2006; 7 (Suppl. 03) Suppl S6.
  • 32 Korkontzelos I, Mu T, Ananiadou S. ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials. BMC Med Inform Decis Mak 2012; 12 (Suppl. 01) Suppl S3.
  • 33 Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 2010; 17 (03) 229-236.
  • 34 GitHub Gist. Extracting key phrases with NLTK in Python. Available from. https://gist.github.com/alexbowe/879414.
  • 35 Brown PF PVd, Mercer RL, Della Pietra VJ, Lai JC. Class-based n-gram models of natural language. Computational Linguistics 1992; 18 (04) 12.
  • 36 Frantzi K, Ananiadou S, Mima H. Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries 2000; 3 (02) 115-130.
  • 37 Boland MR, Tu SW, Carini S, Sim I, Weng C. EliXR-TIME: A Temporal Knowledge Representation for Clinical Research Eligibility Criteria. AMIA Summits Transl Sci Proc. 2012: 71-80.
  • 38 Luo Z, Johnson SB, Lai AM, Weng C. Extracting temporal constraints from clinical research eligibility criteria using conditional random fields. AMIA Annu Symp Proc. 2011: 843-852.
  • 39 Hao T, Rusanov A, Weng C. Extracting and Normalizing Temporal Expressions in Clinical Data Requests from Researchers. Lecture Notes in Computer Science 2013; 8040: 10.
  • 40 Pustejovsky J, Ingria R, Sauri R, Castaño J, Littman J, Gaizauskas R, Setzer A, Katz G, Mani I. The Specification Language TimeML. In: The Language of Time: A Reader. Oxford University Press; 2007
  • 41 Branimir B, Rie Kubota A. TimeML-compliant text analysis for temporal reasoning. In: Proceedings of the 19th international joint conference on Artificial intelligence. Edinburgh, Scotland: Morgan Kaufmann Publishers Inc.. 2005: 997-1003.
  • 42 Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med 1998; 37 4-5 394-403.
  • 43 Fan JW, Friedman C. Semantic classification of biomedical concepts using distributional similarity. Journal of the American Medical Informatics Association 2007; 14 (04) 467-477.
  • 44 Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge, New York: Cambridge University Press; 2008. xxi 482.
  • 45 National institute of health. NINDS Common Data Elements. Available from. http://www.commondataelements.ninds.nih.gov/General.aspx# tab=Data_Standards.
  • 46 Sun XH. The Relation of Scalability and Execution Time. In: Proceedings of the 10th International Parallel Processing Symposium. IEEE Computer Society; 1996: 457-462.
  • 47 Sun XH. Scalability versus execution time in scalable systems. Journal of Parallel and Distributed Computing 2002; 62 (02) 173-192.
  • 48 de Marneffe M-C, B.M.a.C.D.M. Generating Typed Dependency Parses from Phrase Structure Parses. LREC. 2006
  • 49 Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association 2010; 17 (05) 507-513.
  • 50 UIMA (Unstructured Information Management Architecture). Available from. http://uima.apache. org.