Relating Complexity and Error Rates of Ontology Concepts

Hua Min; Ling Zheng; Yehoshua Perl; Michael Halper; Sherri de Coronado; Christopher Ochs

doi:10.3414/ME16-01-0085

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2017; 56(03): 200-208
DOI: 10.3414/ME16-01-0085

Paper

Schattauer GmbH

Relating Complexity and Error Rates of Ontology Concepts

More Complex NCIt Concepts Have More Errors

Authors

Hua Min

¹Department of Health Administration and Policy, College of Health and Human Services, George Mason University, Fairfax, VA, USA
Ling Zheng

²Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
Yehoshua Perl

²Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
Michael Halper

³Information Technology Department, New Jersey Institute of Technology, Newark, NJ, USA
Sherri de Coronado

⁴National Cancer Institute, Center for Biomedical Informatics & Information Technology, National Institutes of Health, Rockville, MD, USA
Christopher Ochs

²Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA

Funding Research reported in this publication was partially supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA190779. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Further Information

Publication History

received: 13 July 2016

accepted in revised form: 19 January 2017

Publication Date:
24 January 2018 (online)

Permissions and Reprints

Summary

Objectives: Ontologies are knowledge structures that lend support to many health-information systems. A study is carried out to assess the quality of ontological concepts based on a measure of their complexity. The results show a relation between complexity of concepts and error rates of concepts.

Methods: A measure of lateral complexity defined as the number of exhibited role types is used to distinguish between more complex and simpler concepts. Using a framework called an area taxonomy, a kind of abstraction network that summarizes the structural organization of an ontology, concepts are divided into two groups along these lines. Various concepts from each group are then subjected to a two-phase QA analysis to uncover and verify errors and inconsistencies in their modeling. A hierarchy of the National Cancer Institute thesaurus (NCIt) is used as our test- bed. A hypothesis pertaining to the expected error rates of the complex and simple concepts is tested.

Results: Our study was done on the NCIt’s Biological Process hierarchy. Various errors, including missing roles, incorrect role targets, and incorrectly assigned roles, were discovered and verified in the two phases of our QA analysis. The overall findings confirmed our hypothesis by showing a statistically significant difference between the amounts of errors exhibited by more laterally complex concepts vis-à-vis simpler concepts.

Conclusions: QA is an essential part of any ontology’s maintenance regimen. In this paper, we reported on the results of a QA study targeting two groups of ontology concepts distinguished by their level of complexity, defined in terms of the number of exhibited role types. The study was carried out on a major component of an important ontology, the NCIt. The findings suggest that more complex concepts tend to have a higher error rate than simpler concepts. These findings can be utilized to guide ongoing efforts in ontology QA.

Keywords

Ontology quality assurance - ontology modeling - ontology complexity - abstraction network - National Cancer Institute thesaurus

References
1 Min H, Perl Y, Chen Y, Halper M, Geller J, Wang Y. Auditing as part of the terminology design life cycle. J Am Med Inform Assoc. 2006; 13 (06) 676-690.

Crossref PubMed Search in Google Scholar
Download RIS citation
2 NCI Thesaurus [cited 2016 March 12]. Available from: https://ncit.nci.nih.gov/ncitbrowser/.

Download RIS citation
3 de Coronado S, Haber MW, Sioutos N, Tuttle MS, Wright LW. NCI Thesaurus: using science-based terminology to integrate cancer research results. Stud Health Technol Inform. 2004; 107 (Pt 1): 33-37.

PubMed Search in Google Scholar
Download RIS citation
4 de Coronado S, Wright LW, Fragoso G, Haber MW, Hahn-Dantona EA, Hartel FW. et al. The NCI Thesaurus quality assurance life cycle. J Biomed Inform. 2009; 42 (03) 530-539.

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Cui L. COHeRE: Cross-Ontology Hierarchical Relation Examination for Ontology Quality Assurance. AMIA Annu Symp Proc. 2015; 2015: 456-465.

PubMed Search in Google Scholar
Download RIS citation
6 Mougin F, Bodenreider O. Approaches to eliminating cycles in the UMLS Metathesaurus: naive vs. formal. AMIA Annu Symp Proc. 2005: 550-554.

PubMed Search in Google Scholar
Download RIS citation
7 Gu H, Chen Y, He Z, Halper M, Chen L. Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies. Methods Inf Med. 2016; 55 (02) 158-165.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
8 Cimino JJ. Auditing the Unified Medical Language System with semantic methods. J Am Med Inform Assoc. 1998; 5 (01) 41-51.

Crossref PubMed Search in Google Scholar
Download RIS citation
9 Mougin F, Grabar N. Auditing the multiply-related concepts within the UMLS. J Am Med Inform Assoc. 2014; 21 e2 e185-193.

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Xing G, Zhang GQ, Cui L. FEDRR: fast, exhaustive detection of redundant hierarchical relations for quality improvement of large biomedical ontologies. BioData Min. 2016; 9: 31.

Crossref PubMed Search in Google Scholar
Download RIS citation
11 Bodenreider O. Identifying Missing Hierarchical Relations in SNOMED CT from Logical Definitions Based on the Lexical Features of Concept Names. Proceedings of the 6th International Conference on Biomedical Ontology (ICBO). 2016 Available from: https://mor.nlm.nih.gov/pubs/pdf/2016-icbo-ob.pdf.

PubMed Search in Google Scholar
Download RIS citation
12 Dentler K, Cornet R. Intra-axiom redundancies in SNOMED CT. Artif Intell Med. 2015; 65 (01) 29-34.

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Agrawal A, Elhanan G. Contrasting lexical similarity and formal definitions in SNOMED CT: consistency and implications. J Biomed Inform. 2014; 47: 192-198.

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Jiang G, Chute CG. Auditing the semantic completeness of SNOMED CT using formal concept analysis. J Am Med Inform Assoc. 2009; 16 (01) 89-102.

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Mougin F. Identifying redundant and missing relations in the gene ontology. Stud Health Technol Inform. 2015; 210: 195-199.

PubMed Search in Google Scholar
Download RIS citation
16 Verspoor K, Dvorkin D, Cohen KB, Hunter L. Ontology quality assurance through analysis of term transformations. Bioinformatics. 2009; 25 (12) i77-84.

Crossref PubMed Search in Google Scholar
Download RIS citation
17 Ceusters W. Applying evolutionary terminology auditing to the Gene Ontology. J Biomed Inform. 2009; 42 (03) 518-529.

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Kohler J, Munn K, Ruegg A, Skusa A, Smith B. Quality control for terms and definitions in ontologies and taxonomies. BMC Bioinformatics. 2006; 7: 212.

Crossref PubMed Search in Google Scholar
Download RIS citation
19 Rogers JE. Quality assurance of medical ontologies. Methods Inf Med. 2006; 45 (03) 267-274.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
20 Zhu X, Fan JW, Baorto DM, Weng C, Cimino JJ. A review of auditing methods applied to the content of controlled biomedical terminologies. J Biomed Inform. 2009; 42 (03) 413-425.

Crossref PubMed Search in Google Scholar
Download RIS citation
21 Geller J, Perl Y, Halper M, Cornet R. Special issue on auditing of terminologies. J Biomed Inform. 2009; 42 (03) 407-411.

Crossref PubMed Search in Google Scholar
Download RIS citation
22 Stearns MQ, Price C, Spackman KA, Wang AY. SNOMED clinical terms: overview of the development process and project status. AMIA Annu Symp Proc. 2001: 662-666.

PubMed Search in Google Scholar
Download RIS citation
23 U.S. Department of Veterans Affairs. National Drug File - Reference Terminology (NDF-RT™) Documentation February 2015 Version [cited 2016 July 11]. Available from: http://evs.nci.nih.gov/ftp1/NDF-RT.

Download RIS citation
24 Wei D, Bodenreider O. Using the abstraction network in complement to description logics for quality assurance in biomedical terminologies - a case study in SNOMED CT. Stud Health Technol Inform. 2010; 160 (Pt 2): 1070-1074.

PubMed Search in Google Scholar
Download RIS citation
25 Cohen B, Oren M, Min H, Perl Y, Halper M. Automated comparative auditing of NCIT genomic roles using NCBI. J Biomed Inform. 2008; 41 (06) 904-913.

Crossref PubMed Search in Google Scholar
Download RIS citation
26 Mougin F, Bodenreider O. Auditing the NCI thesaurus with semantic web technologies. AMIA Annu Symp Proc. 2008: 500-504.

PubMed Search in Google Scholar
Download RIS citation
27 de Coronado S, Tuttle MS, Solbrig HR. Using the UMLS Semantic Network to validate NCI Thesaurus structure and analyze its alignment with the OBO relations ontology. AMIA Annu Symp Proc. 2007: 165-170.

PubMed Search in Google Scholar
Download RIS citation
28 McCray AT, Nelson SJ. The representation of meaning in the UMLS. Methods Inf Med. 1995; 34 1-2 193-201.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
29 McCray AT. An upper-level ontology for the biomedical domain. Comp Funct Genomics. 2003; 4 (01) 80-84.

Crossref PubMed Search in Google Scholar
Download RIS citation
30 Ceusters W, Smith B, Goldberg L. A terminological and ontological analysis of the NCI Thesaurus. Methods Inf Med. 2005; 44 (04) 498-507.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
31 Schulz S, Schober D, Tudose I, Stenzhorn H. The Pitfalls of Thesaurus Ontologization - the Case of the NCI Thesaurus. AMIA Annu Symp Proc. 2010; 2010: 727-731.

PubMed Search in Google Scholar
Download RIS citation
32 Wang Y, Halper M, Wei D, Perl Y, Geller J. Abstraction of complex concepts with a refined partial- area taxonomy of SNOMED. J Biomed Inform. 2012; 45 (01) 15-29.

Crossref PubMed Search in Google Scholar
Download RIS citation
33 Wang Y, Halper M, Wei D, Gu H, Perl Y, Xu J. et al. Auditing complex concepts of SNOMED using a refined hierarchical abstraction network. J Biomed Inform. 2012; 45 (01) 1-14.

Crossref PubMed Search in Google Scholar
Download RIS citation
34 Ochs C, Geller J, Perl Y, Chen Y, Xu J, Min H. et al. Scalable quality assurance for large SNOMED CT hierarchies using subject-based subtaxonomies. J Am Med Inform Assoc. 2015; 22 (03) 507-518.

Crossref PubMed Search in Google Scholar
Download RIS citation
35 Ochs C, Geller J, Perl Y, Chen Y, Agrawal A, Case JT. et al. A tribal abstraction network for SNOMED CT target hierarchies without attribute relationships. J Am Med Inform Assoc. 2015; 22 (03) 628-639.

Crossref PubMed Search in Google Scholar
Download RIS citation
36 Halper M, Wang Y, Min H, Chen Y, Hripcsak G, Perl Y. et al. Analysis of error concentrations in SNOMED. AMIA Annu Symp Proc. 2007: 314-318.

PubMed Search in Google Scholar
Download RIS citation
37 Luo L, Xu R, Zhang GQ. Dissecting the Ambiguity of FMA Concept Names Using Taxonomy and Partonomy Structural Information. AMIA Jt Summits Transl Sci Proc. 2013; 2013: 157-161.

PubMed Search in Google Scholar
Download RIS citation
38 Luo L, Mejino Jr. JL, Zhang GQ. An analysis of FMA using structural self-bisimilarity. J Biomed Inform. 2013; 46 (03) 497-505.

Crossref PubMed Search in Google Scholar
Download RIS citation
39 Zhang GQ, Bodenreider O. Large-scale, Exhaustive Lattice-based Structural Auditing of SNOMED CT. AMIA Annu Symp Proc. 2010; 2010: 922-926.

PubMed Search in Google Scholar
Download RIS citation
40 Zhang GQ, Bodenreider O. Using SPARQL to Test for Lattices: application to quality assurance in biomedical ontologies. Semant Web ISWC. 2010; 6497: 273-288.

PubMed Search in Google Scholar
Download RIS citation
41 Fragoso G, de Coronado S, Haber M, Hartel F, Wright L. Overview and utilization of the NCI thesaurus. Comp Funct Genomics. 2004; 5 (08) 648-654.

Crossref PubMed Search in Google Scholar
Download RIS citation
42 Wang H, Yatawara M, Huang SC, Dudley K, Szekely C, Holden S. et al. The integrated proactive surveillance system for prostate cancer. Open Med Inform J. 2012; 6: 1-8.

PubMed Search in Google Scholar
Download RIS citation
43 Shah NH, Rubin DL, Supekar KS, Musen MA. Ontology-based annotation and query of tissue microarray data. AMIA Annu Symp Proc. 2006: 709-713.

PubMed Search in Google Scholar
Download RIS citation
44 Jiang G, Sohn S, Zimmermann MT, Wang C, Liu H, Chute CG. Drug Normalization for Cancer Therapeutic and Druggable Genome Target Discovery. AMIA Jt Summits Transl Sci Proc. 2015; 2015: 72-76.

PubMed Search in Google Scholar
Download RIS citation
45 Donfack Guefack V, Bertaud Gounot V, Duvauferrier R, Bourde A, Morelli J, Lasbleiz J. Ontology driven decision support systems for medical diagnosis - an interactive form for consultation in patients with plasma cell disease. Stud Health Technol Inform. 2012; 180: 108-112.

PubMed Search in Google Scholar
Download RIS citation
46 Kahn MG, Bailey LC, Forrest CB, Padula MA, Hirschfeld S. Building a common pediatric research terminology for accelerating child health research. Pediatrics. 2014; 133 (03) 516-525.

Crossref PubMed Search in Google Scholar
Download RIS citation
47 Reed TL, Kaufman-Rivi D. FDA adverse Event Problem Codes: standardizing the classification of device and patient problems associated with medical device use. Biomed Instrum Technol. 2010; 44 (03) 248-256.

Crossref PubMed Search in Google Scholar
Download RIS citation
48 Chen S-B, Hsu C-Y. The TCR cancer registry repository for annotating cancer data. 2nd IEEE International Conference on Emergency Management and Management Sciences. 2011: 297-300.

PubMed Search in Google Scholar
Download RIS citation
49 Halper M, Gu H, Perl Y, Ochs C. Abstraction networks for terminologies: Supporting management of “big knowledge”. Artif Intell Med. 2015; 64 (01) 1-16.

Crossref PubMed Search in Google Scholar
Download RIS citation
50 Wang Y, Halper M, Min H, Perl Y, Chen Y, Spackman KA. Structural methodologies for auditing SNOMED. J Biomed Inform. 2007; 40 (05) 561-581.

Crossref PubMed Search in Google Scholar
Download RIS citation
51 Goodrich MT, Tamassia R. Divide-and-Conquer. Algorithm Design: Foundations, Analysis, and Internet Examples.. 1st ed. New York: John Wiley & Sons, Inc; 2001: 263-273.

Search in Google Scholar
Download RIS citation
52 Ochs C, Geller J, Perl Y, Musen MA. A unified software framework for deriving, visualizing, and exploring abstraction networks for ontologies. J Biomed Inform. 2016; 62: 90-105.

Crossref PubMed Search in Google Scholar
Download RIS citation
53 Morrey CP, Geller J, Halper M, Perl Y. The Neighborhood Auditing Tool: a hybrid interface for auditing the UMLS. J Biomed Inform. 2009; 42 (03) 468-489.

Crossref PubMed Search in Google Scholar
Download RIS citation
54 Good PI. Permutation, Parametric, and Bootstrap Tests of Hypotheses: A Practical Guide to Resampling.. 3rd ed. New York: Springer; 2005

Download RIS citation
55 NCIt Download [cited 2016 March 12]. Available from: https://evs.nci.nih.gov/ftp1/NCI_Thesaur-us/.

Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

Relating Complexity and Error Rates of Ontology Concepts

Authors

Publication History

Summary

Keywords

References