CC BY-NC-ND 4.0 · Yearb Med Inform 2019; 28(01): 208-217
DOI: 10.1055/s-0039-1677918
Section 10: Natural Language Processing
Survey
Georg Thieme Verlag KG Stuttgart

Recent Advances in Using Natural Language Processing to Address Public Health Research Questions Using Social Media and ConsumerGenerated Data

Mike Conway
1  Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
,
Mengke Hu
1  Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
,
Wendy W. Chapman
1  Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, United States
› Author Affiliations
Further Information

Publication History

Publication Date:
16 August 2019 (online)

  

Summary

Objective: We present a narrative review of recent work on the utilisation of Natural Language Processing (NLP) for the analysis of social media (including online health communities) specifically for public health applications.

Methods: We conducted a literature review of NLP research that utilised social media or online consumer-generated text for public health applications, focussing on the years 2016 to 2018. Papers were identified in several ways, including PubMed searches and the inspection of recent conference proceedings from the Association of Computational Linguistics (ACL), the Conference on Human Factors in Computing Systems (CHI), and the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM). Popular data sources included Twitter, Reddit, various online health communities, and Facebook.

Results: In the recent past, communicable diseases (e.g., influenza, dengue) have been the focus of much social media-based NLP health research. However, mental health and substance use and abuse (including the use of tobacco, alcohol, marijuana, and opioids) have been the subject of an increasing volume of research in the 2016 - 2018 period. Associated with this trend, the use of lexicon-based methods remains popular given the availability of psychologically validated lexical resources suitable for mental health and substance abuse research. Finally, we found that in the period under review “modern" machine learning methods (i.e. deep neural-network-based methods), while increasing in popularity, remain less widely used than “classical" machine learning methods.