Subscribe to RSS
DOI: 10.1055/a-2576-1847
ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data
Funding This work is partially supported by the National Natural Science Foundation of China (82174533 and 82204941), the Natural Science Foundation of Beijing (M21012), and the Key Project of Hubei Natural Science Foundation (2020CFA023).
Abstract
Background
Symptom phenotypes are crucial for diagnosing and treating various disease conditions. However, the diversity of symptom terminologies poses a significant challenge to analyzing and sharing of symptom-related medical data, particularly in the field of traditional Chinese medicine (TCM). This study aims to construct an Integrated Symptom Phenotype Ontology (ISPO) to support data mining of Chinese electronic medical records (EMRs) and real-world studies in the TCM field.
Methods
We manually annotated and extracted symptom terms from 21 classical TCM textbooks and 78,696 inpatient EMRs, and integrated them with five publicly available symptom-related biomedical vocabularies. Through a human–machine collaborative approach for terminology editing and ontology development, including term screening, semantic mapping, and concept classification, we constructed a high-quality symptom ontology that integrates both TCM and Western medical terminology.
Results
ISPO provides 3,147 concepts, 23,475 terms, and 23,363 hierarchical relationships. Compared with international symptom-related ontologies such as the Symptom Ontology, ISPO offers significant improvements in the number of terms and synonymous relationships. Furthermore, evaluation across three independent curated clinical datasets demonstrated that ISPO achieved over 90% coverage of symptom terms, highlighting its strong clinical usability and completeness.
Conclusion
ISPO represents the first clinical ontology globally dedicated to the systematic representation of symptoms. It integrates symptom terminologies from historical and contemporary sources, encompassing both TCM and Western medicine, thereby enhancing semantic interoperability across heterogeneous medical data sources and clinical decision support systems in TCM.
Keywords
medical ontology - symptom phenotypes - traditional Chinese medicine - biomedical terminology - electronic medical recordsData Availability Statement
ISPO is publicly available as a free web resource (http://www.tcmkg.com/ISPO/home) and has been uploaded to BioPortal since May 2023.
Ethical Approval Statement
This study only utilizes clinical symptom terminology and does not involve human participants, patient data, or identifiable personal information.
Authors' Contribution
X.Z.Zhou, M.X., R.S.Zhang, X.J.Zhou, X.D.L., and B.Y. Liu conceived the study. Z.X.S. and R.H. analyzed the data. Z.X.S., R.H., and X.Z.Zhou drafted and revised the manuscript. All authors provided important contributions to data collection, processing, and review. All authors have proofread the manuscript.
* These authors contributed equally.
Publication History
Received: 04 February 2025
Accepted: 01 April 2025
Article published online:
06 May 2025
© 2025. Thieme. All rights reserved.
Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany
-
References
- 1 Lu A-P, Jia H-W, Xiao C, Lu Q-P. Theory of traditional Chinese medicine and therapeutic method of diseases. World J Gastroenterol 2004; 10 (13) 1854-1856
- 2 Cashion AK, Gill J, Hawes R, Henderson WA, Saligan L. National Institutes of Health Symptom Science Model sheds light on patient symptoms. Nurs Outlook 2016; 64 (05) 499-506
- 3 Dodd M, Janson S, Facione N. et al. Advancing the science of symptom management. J Adv Nurs 2001; 33 (05) 668-676
- 4 Hickey KT, Bakken S, Byrne MW. et al. Corrigendum to Precision health: advancing symptom and self-management science. Nurs Outlook 2020; 68 (02) 139-140
- 5 Xiaoyu Z, Sichao T, Ching YL. et al. Problems and solutions of symptom efficacy evaluation of traditional Chinese medicine in the era of big data [Article in Chinese]. Chin Med J (Engl) 2024; 65 (08) 792-795
- 6 Sun W, Cai Z, Li Y, Liu F, Fang S, Wang G. Data processing and text mining technologies on electronic medical records: a review. J Healthc Eng 2018; 2018: 4302425
- 7 Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2014; 2: 3
- 8 Hoehndorf R, Schofield PN, Gkoutos GV. The role of ontologies in biological and biomedical research: a functional perspective. Brief Bioinform 2015; 16 (06) 1069-1080
- 9 Gruber TR. A translation approach to portable ontology specifications. Knowl Acquis 1993; 5 (02) 199-220
- 10 Rector AL, Nowlan WA. The GALEN project. Comput Methods Programs Biomed 1994; 45 (1-2): 75-78
- 11 Lipscomb CE. Medical subject headings (MeSH). Bull Med Libr Assoc 2000; 88 (03) 265-266
- 12 Stearns MQ, Price C, Spackman KA, Wang AY. eds. SNOMED clinical terms: overview of the development process and project status. Proceedings of the American Medical Informatics Association (AMIA) Symposium; 2001
- 13 Harrison JE, Weber S, Jakob R, Chute CG. ICD-11: an international classification of diseases for the twenty-first century. BMC Med Inform Decis Mak 2021; 21 (Suppl. 06) 206
- 14 Köhler S, Gargano M, Matentzoglu N. et al. The human phenotype ontology in 2021. Nucleic Acids Res 2021; 49 (D1): D1207-D1217
- 15 Yu S, Yuan Z, Xia J. et al. Bios: An algorithmically generated biomedical knowledge graph. arXiv preprint arXiv:220309975 2022
- 16 Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; 32 (Database issue, suppl_1): D267-D270
- 17 Zhu Y, Yao KY, Peng SY, Yang XL. Traditional Chinese medicine (TCM) domain ontology: current status and rethinking for the future development. Chin Med Sci J 2022; 37 (03) 228-233
- 18 Bagley SC, Altman RB. Computing disease incidence, prevalence and comorbidity from electronic medical records. J Biomed Inform 2016; 63: 108-111
- 19 Noy NF, Shah NH, Whetzel PL. et al. BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res 2009; 37 (Web Server issue, suppl_2): W170-3
- 20 Zhou X, Wu Z, Yin A, Wu L, Fan W, Zhang R. Ontology development for unified traditional Chinese medical language system. Artif Intell Med 2004; 32 (01) 15-27
- 21 Long H, Zhu Y, Jia L. et al. An ontological framework for the formalization, organization and usage of TCM-knowledge. BMC Med Inform Decis Mak 2019; 19 (Suppl. 02) 53
- 22 Xi C, Qi W. Research status and prospect of ontology in the field of traditional Chinese medicine. Shi Zhen National Medicine National Medicine 2025; 36 (03) 544-550
- 23 Taotao F, Yanmei C, Kyung-na L, Yiming S, Guobin S, Mengchun G. Construction and application of knowledge graph of traditional Chinese medicine based on Chinese Pharmacopoeia [Article in Chinese]. Journal of Medical Informatics 2024; 45 (10) 33-39
- 24 Gao ZY, Xu H, Shi DZ, Wen C, Liu BY. Analysis on outcome of 5284 patients with coronary artery disease: the role of integrative medicine. J Ethnopharmacol 2012; 141 (02) 578-583
- 25 Ling Z, Jinghua L, Yu Q. et al. Construction of TCM asthma domain ontology [Article in Chinese]. Chinese Journal of Experimental Medicine 2017; 23 (15) 222-226
- 26 Wu Y, Zhang F, Yang K. et al. SymMap: an integrative database of traditional Chinese medicine enhanced by symptom mapping. Nucleic Acids Res 2019; 47 (D1): D1110-D1117
- 27 Zhang Y, Wang N, Du X. et al. SoFDA: an integrated web platform from syndrome ontology to network-based evaluation of disease-syndrome-formula associations for precision medicine. Sci Bull (Beijing) 2022; 67 (11) 1097-1101
- 28 Cristani M, Cuel R. A survey on ontology creation methodologies. Int J Semantic Web Inf Syst 2005; 1 (02) 49-69 (IJSWIS)
- 29 Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 2012; 13 (06) 395-405
- 30 Zhou X, Peng Y, Liu B. Text mining for traditional Chinese medical knowledge discovery: a survey. J Biomed Inform 2010; 43 (04) 650-660
- 31 Zou Q, Yang K, Chang K, Zhang X, Li X, Zhou X. eds. Phenonizer: A fine-grained phenotypic named entity recognizer for Chinese clinical texts. 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2021
- 32 Chen T, Xu R, He Y, Wang X. Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN. Expert Syst Appl 2017; 72: 221-230
- 33 Dhombres F, Bodenreider O. Interoperability between phenotypes in research and healthcare terminologies—investigating partial mappings between HPO and SNOMED CT. J Biomed Semantics 2016; 7: 3
- 34 Hoehndorf R, Oellrich A, Rebholz-Schuhmann D. Interoperability between phenotype and anatomy ontologies. Bioinformatics 2010; 26 (24) 3112-3118
- 35 Callahan TJ, Stefanski AL, Wyrwa JM. et al. Ontologizing health systems data at scale: making translational discovery a reality. arXiv preprint arXiv:220904732 2022
- 36 Schuyler PL, Hole WT, Tuttle MS, Sherertz DD. The UMLS Metathesaurus: representing different views of biomedical concepts. Bull Med Libr Assoc 1993; 81 (02) 217-222
- 37 Hlomani H, Stacey D. Approaches, methods, metrics, measures, and subjectivity in ontology evaluation: a survey. Semantic Web Journal 2014; 1 (05) 1-11
- 38 Li Y, Li H, Zhu Q. et al. Effect of acute upper gastrointestinal bleeding manifestations at admission on the in-hospital outcomes of liver cirrhosis: hematemesis versus melena without hematemesis. Eur J Gastroenterol Hepatol 2019; 31 (11) 1334-1341
- 39 Salwen-Deremer JK, Smith MT, Haskell HG, Schreyer C, Siegel CA. Poor sleep in inflammatory bowel disease is reflective of distinct sleep disorders. Dig Dis Sci 2022; 67 (07) 3096-3107
- 40 Harrow I, Balakrishnan R, Jimenez-Ruiz E. et al. Ontology mapping for semantically enabled applications. Drug Discov Today 2019; 24 (10) 2068-2075
- 41 Ivanović M, Budimac Z. An overview of ontologies and data resources in medical domains. Expert Syst Appl 2014; 41 (11) 5158-5166
- 42 Chan KW, Shu Z, Chang K, Liu B, Zhou X, Li X. Add-on Chinese medicine for coronavirus disease 2019 (COVID-19): a retrospective cohort. Eur J Integr Med 2021; 48: 101903
- 43 Gan X, Shu Z, Wang X. et al. Network medicine framework reveals generic herb-symptom effectiveness of traditional Chinese medicine. Sci Adv 2023; 9 (43) eadh0215
- 44 Cheng S, Liang X, Bi Z, Zhang N, Chen H. ProteinKG65: a knowledge graph for protein science. arXiv preprint arXiv:220710080 2022
- 45 Schriml LM, Arze C, Nadendla S. et al. Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 2012; 40 (Database issue, D1): D940-D946
- 46 Ashburner M, Ball CA, Blake JA. et al; The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat Genet 2000; 25 (01) 25-29
- 47 Robinson PN, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S. The human phenotype ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet 2008; 83 (05) 610-615
- 48 Li L, Wang P, Yan J. et al. Real-world data medical knowledge graph: construction and applications. Artif Intell Med 2020; 103: 101817
- 49 Shu Z, Liu W, Wu H. et al. Symptom-based network classification identifies distinct clinical subgroups of liver diseases with common molecular pathways. Comput Methods Programs Biomed 2019; 174: 41-50
- 50 Shu Z, Zhou Y, Chang K. et al. Clinical features and the traditional Chinese medicine therapeutic characteristics of 293 COVID-19 inpatient cases. Front Med 2020; 14 (06) 760-775
- 51 Xu N, Zhong K, Yu H. et al. Add-on Chinese medicine for hospitalized chronic obstructive pulmonary disease (CHOP): a cohort study of hospital registry. Phytomedicine 2023; 109: 154586
- 52 Liu B, Zhou X, Wang Y. et al. Data processing and analysis in real-world traditional Chinese medicine clinical data: challenges and approaches. Stat Med 2012; 31 (07) 653-660
- 53 An Y, Xia X, Chen X, Wu F-X, Wang J. Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF. Artif Intell Med 2022; 127: 102282
- 54 Mou Z, He L, Zheng Q. et al. Classification and analysis of symptom characteristics and acupoint experience in acupuncture treatment of children with cerebral palsy [Article in Chinese]. World Science and Technology Modernization of Traditional Chinese Medicine 2020; 22 (11) 3959-3965
- 55 Benesty J, Chen J, Huang Y, Cohen I. Pearson correlation coefficient. In: Noise Reduction Speech Processing. Springer; 2009: 1-4
- 56 Montani S. How to use contextual knowledge in medical case-based reasoning systems: a survey on very recent trends. Artif Intell Med 2011; 51 (02) 125-131
- 57 Benslimane D, Arara A, Falquet G, Maamar Z, Thiran P, Gargouri F. eds. Contextual ontologies: Motivations, challenges, and solutions. Advances in Information Systems: 4th International Conference, ADVIS 2006, Izmir, Turkey, October 18–20, 2006 Proceedings 4. Springer; 2006