Summary
Objectives: We survey recent work in biomedical NLP on building more adaptable or generalizable
models, with a focus on work dealing with electronic health record (EHR) texts, to
better understand recent trends in this area and identify opportunities for future
research.
Methods: We searched PubMed, the Institute of Electrical and Electronics Engineers (IEEE),
the Association for Computational Linguistics (ACL) anthology, the Association for
the Advancement of Artificial Intelligence (AAAI) proceedings, and Google Scholar
for the years 2018-2020. We reviewed abstracts to identify the most relevant and impactful
work, and manually extracted data points from each of these papers to characterize
the types of methods and tasks that were studied, in which clinical domains, and current
state-of-the-art results.
Results: The ubiquity of pre-trained transformers in clinical NLP research has contributed
to an increase in domain adaptation and generalization-focused work that uses these
models as the key component. Most recently, work has started to train biomedical transformers
and to extend the fine-tuning process with additional domain adaptation techniques.
We also highlight recent research in cross-lingual adaptation, as a special case of
adaptation.
Conclusions: While pre-trained transformer models have led to some large performance improvements,
general domain pre-training does not always transfer adequately to the clinical domain
due to its highly specialized language. There is also much work to be done in showing
that the gains obtained by pre-trained transformers are beneficial in real world use
cases. The amount of work in domain adaptation and transfer learning is limited by
dataset availability and creating datasets for new domains is challenging. The growing
body of research in languages other than English is encouraging, and more collaboration
between researchers across the language divide would likely accelerate progress in
non-English clinical NLP.
Keywords
Natural language processing - domain adaptation - transfer learning - electronic health
records