Methods Inf Med 2008; 47(06): 513-521
DOI: 10.3414/ME9127
Original Article
Schattauer GmbH

Development of a Medical-text Parsing Algorithm Based on Character Adjacent Probability Distribution for Japanese Radiology Reports

N. Nishimoto
1   Department of Medical Informatics, Hokkaido University, Sapporo, Japan
,
S. Terae
2   Department of Radiology, Hokkaido University, Sapporo, Japan
,
M. Uesugi
1   Department of Medical Informatics, Hokkaido University, Sapporo, Japan
,
K. Ogasawara
3   Department of Health Sciences, Hokkaido University, Sapporo, Japan
,
T. Sakurai
1   Department of Medical Informatics, Hokkaido University, Sapporo, Japan
› Institutsangaben
Weitere Informationen

Publikationsverlauf

Publikationsdatum:
18. Januar 2018 (online)

Preview

Summary

Objectives: The objectives of this study were to investigate the transitional probability distribution of medical term boundaries between characters and to develop a parsing algorithm specifically for medical texts.

Methods: Medical terms in Japanese computed tomography (CT) reports were identified using the ChaSen morphological analysis system. MeSH-based medical terms (51,385 entries), obtained from the metathesaurus in the Unified Medical Language System (UMLS, 2005AA), were added as a medical dictionary for ChaSen. A radiographer corrected the set of results containing 300 parsed CT reports. In addition, two radiologists checked the medical term parsing of 200 CT sentences.

Results: We obtained modified inter-annotator agreement scores for the text corrected by the radiologists. We retrieved the transitional probability as the conditional probability of a uni-gram, bi-gram, and tri-gram. The highest transitional probability P(Ci | Ci - 2*Ci - 1) was 1.00. For an example of anatomical location, the term “pulmonary hilum” was parsed as a tri-gram.

Conclusions: Retrieval of transitional probability will improve the accuracy of parsing compound medical terms.