Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies

H. Gu; Y. Chen; Z. He; M. Halper; L. Chen

doi:10.3414/ME14-01-0104

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Share / Bookmark

Facebook Linkedin Weibo

Download PDF

Methods Inf Med 2016; 55(02): 158-165
DOI: 10.3414/ME14-01-0104

Original Articles

Schattauer GmbH

Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies

H. Gu

¹Computer Science Department, New York Institute of Technology, New York, NY, USA

,

Y. Chen

²Computer Information Systems Department, BMCC, CUNY, New York, NY, USA

,

Z. He

³Biomedical Informatics Department, Columbia University, New York, NY, USA

,

M. Halper

⁴Information Technology Department, New Jersey Institute of Technology, Newark, New Jersey, USA

,

L. Chen

⁵Science Department, BMCC, CUNY, New York, NY, USA

› Author Affiliations

Further Information

Publication History

received: 20 October 2014

accepted: 25 March 2015

Publication Date:
08 January 2018 (online)

Abstract
Full Text
References

Permissions and Reprints

Summary

Background: The Unified Medical Language System (UMLS) is one of the largest biomedical terminological systems, with over 2.5 million concepts in its Metathesaurus repository. The UMLS’s Semantic Network (SN) with its collection of 133 high-level semantic types serves as an abstraction layer on top of the Metathesaurus. In particular, the SN elaborates an aspect of the Metathesaurus’s concepts via the assignment of one or more types to each concept. Due to the scope and complexity of the Metathesaurus, errors are all but inevitable in this semantic-type assignment process.

Objectives: To develop a semi-automated methodology to help assure the quality of semantic-type assignments within the UMLS.

Methods: The methodology uses a cross- validation strategy involving SNOMED CT’s hierarchies in combination with UMLS se -mantic types. Semantically uniform, disjoint concept groups are generated programmatically by partitioning the collection of all concepts in the same SNOMED CT hierarchy according to their respective semantic-type assignments in the UMLS. Domain experts are then called upon to review the concepts in any group having a small number of concepts. It is our hypothesis that a semantic-type assignment combination applicable only to a very small number of concepts in a SNOMED CT hierarchy is an indicator of potential problems.

Results: The methodology was applied to the UMLS 2013AA release along with the SNOMED CT from January 2013. An overall error rate of 33% was found for concepts proposed by the quality-assurance methodology. Supporting our hypothesis, that number was four times higher than the error rate found in control samples.

Conclusion: The results show that the quality-assurance methodology can aid in effective and efficient identification of UMLS semantic-type assignment errors.

Keywords

Medical terminology - quality assurance - UMLS - SNOMED CT - semantic-type assignment - auditing of terminologies - UMLS auditing

References
1 Humphreys BL, Lindberg DAB, Schoolman HM, and Barnett GO. The Unified Medical Language System: An Informatics Research Collaboration. JAMIA 1998; 5 (Suppl. 01) 1-11.

MissingFormLabel
PubMed Search in Google Scholar
2 McCray AT, Hole WT. The Scope and Structure of the First Version of the UMLS Semantic Network. Proceedings of 14th Annual Symposium on Computer Applications in Medical Care. 1990 pp 126-130.

MissingFormLabel
PubMed Search in Google Scholar
3 Min H, Perl Y, Chen Y, Halper M, Geller J, and Wang Y. Auditing as part of the terminology design life cycle. JAMIA 2006; 13 (Suppl. 06) 676-690.

MissingFormLabel
PubMed Search in Google Scholar
4 IHTSDO: SNOMED CT. http://www.ihtsdo.org/snomed-ct.

MissingFormLabel
PubMed
5 Gu H, Perl Y, Elhanan G, Min H, Zhang L, Peng Y. Auditing concept categorizations in the UMLS. Artificial Intelligence in Medicine 2004; 31 (Suppl. 01) 29-44.

MissingFormLabel
Crossref PubMed Search in Google Scholar
6 Gu H, Hripcsak G, Chen Y, Morrey CP, Elhanan G, Cimino JJ. et al. Evaluation of a UMLS auditing process of semantic type assignments. In: Teich JM, Suermondt J, Hripcsak G. editors. Proc 2007 AMIA annual symposium. 2007 pp 294-298.

MissingFormLabel
PubMed Search in Google Scholar
7 Chen Y, Gu H, Perl Y, Geller J, Halper M. Structural group auditing of a UMLS semantic type’s extent. J Biomed Inform 2009; 42 (Suppl. 01) 41-52.

MissingFormLabel
Crossref PubMed Search in Google Scholar
8 Chen Y, Gu H, Perl Y, Halper M, Xu J. Expanding the extent of a UMLS semantic type via group neighborhood auditing. JAMIA 2009; 16 (Suppl. 05) 746-757.

MissingFormLabel
PubMed Search in Google Scholar
9 Gu H, Elhanan G, Perl Y, Hripcsak G, Cimino JJ, Xu J, Chen Y, Geller J, Morrey CP. A study of terminology auditors’ performance for UMLS semantic type assignments. J Biomed Inform 2012; 45 (Suppl. 06) 1042-1048.

MissingFormLabel
Crossref PubMed Search in Google Scholar
10 Geller J, He Z, Perl Y, Morrey CP, Xu J. Rule-based support system for multiple UMLS semantic type assignments. J Biomed Inform 2013; 46 (Suppl. 01) 97-110.

MissingFormLabel
Crossref PubMed Search in Google Scholar
11 He Z, Morrey CP, Perl Y, Elhanan G, Chen L, Chen Y, Geller J. Sculpting the UMLS refined semantic network. Online Journal of Public Health Informatics 2014; 6 (Suppl. 02) e181.

MissingFormLabel
PubMed Search in Google Scholar
12 Ceusters W. Applying evolutionary terminology auditing to the Gene Ontology. J Biomed Inform 2009; 42 (Suppl. 03) 518-529.

MissingFormLabel
Crossref PubMed Search in Google Scholar
13 Luo L, Mejino Jr JL, and Zhang GQ. An analysis of FMA using structural self-bisimilarity. J Biomed Inform 2013; 46 (Suppl. 03) 497-505.

MissingFormLabel
Crossref PubMed Search in Google Scholar
14 NCI Thesaurus. http://ncit.nci.nih.gov.

MissingFormLabel
PubMed
15 Sioutos N, de Coronado S, Haber M, Hartel F, Shaiu W, Wright L. NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 2007; 40 (Suppl. 01) 30-43.

MissingFormLabel
Crossref PubMed Search in Google Scholar
16 Fragoso G, de Coronado S, Haber M, Hartel F, Wright L. Overview and utilization of the NCI thesaurus. Comparative and Functional Genomics 2004; 5 (Suppl. 08) 648-654.

MissingFormLabel
Crossref PubMed Search in Google Scholar
17 Adamusiak T, Bodenreider O. Quality assurance in LOINC using description logic. Proc 2012 AMIA Annual Symposium. 2012 pp 1099-1108.

MissingFormLabel
PubMed Search in Google Scholar
18 Mortensen JM, Minty EP, Januszyk M, Sweeney TE, Rector AL, Noy NF, and Musen MA. Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT. JAMIA. 2014 Published Online First: [13 November 2014].

MissingFormLabel
PubMed Search in Google Scholar
19 Mougin F, and Bodenreider O. Auditing the NCI thesaurus with semantic web technologies. Proc 2008 AMIA Annual Symposium. 2008 pp 500-504.

MissingFormLabel
PubMed Search in Google Scholar
20 Definition of UMLS Semantic Types. http://semanticnetwork.nlm.nih.gov/Download/RelationalFiles/SRDEF

MissingFormLabel
PubMed
21 Geller J, Gu H, Perl Y, Halper M. Semantic refinement and error correction in large terminological knowledge bases. Data & Knowledge Engineering 2003; 45 (Suppl. 01) 1-32.

MissingFormLabel
Crossref PubMed Search in Google Scholar

Subscribe to RSS

Share / Bookmark

Quality Assurance of UMLS Semantic Type Assignments Using SNOMED CT Hierarchies

Publication History

Summary

Keywords

References