CC BY-NC-ND 4.0 · Yearb Med Inform 2018; 27(01): 184-192
DOI: 10.1055/s-0038-1667079
Section 9: Natural Language Processing
Survey
Georg Thieme Verlag KG Stuttgart

Advancing the State of the Art in Clinical Natural Language Processing through Shared Tasks

Michele Filannino
1  George Mason University, Fairfax, VA, USA
2  Massachusetts Institute of Technology, Cambridge, MA, USA
,
Özlem Uzuner
1  George Mason University, Fairfax, VA, USA
2  Massachusetts Institute of Technology, Cambridge, MA, USA
› Author Affiliations
Further Information

Publication History

Publication Date:
29 August 2018 (online)

  

Summary

Objectives: To review the latest scientific challenges organized in clinical Natural Language Processing (NLP) by highlighting the tasks, the most effective methodologies used, the data, and the sharing strategies.

Methods: We harvested the literature by using Google Scholar and PubMed Central to retrieve all shared tasks organized since 2015 on clinical NLP problems on English data.

Results: We surveyed 17 shared tasks. We grouped the data into four types (synthetic, drug labels, social data, and clinical data) which are correlated with size and sensitivity. We found named entity recognition and classification to be the most common tasks. Most of the methods used to tackle the shared tasks have been data-driven. There is homogeneity in the methods used to tackle the named entity recognition tasks, while more diverse solutions are investigated for relation extraction, multi-class classification, and information retrieval problems.

Conclusions: There is a clear trend in using data-driven methods to tackle problems in clinical NLP. The availability of more and varied data from different institutions will undoubtedly lead to bigger advances in the field, for the benefit of healthcare as a whole.

Author Contribution