Summary
Objective: We present a narrative review of recent work on the utilisation of Natural Language
Processing (NLP) for the analysis of social media (including online health communities)
specifically for public health applications.
Methods: We conducted a literature review of NLP research that utilised social media or online
consumer-generated text for public health applications, focussing on the years 2016
to 2018. Papers were identified in several ways, including PubMed searches and the
inspection of recent conference proceedings from the Association of Computational
Linguistics (ACL), the Conference on Human Factors in Computing Systems (CHI), and
the International AAAI (Association for the Advancement of Artificial Intelligence)
Conference on Web and Social Media (ICWSM). Popular data sources included Twitter,
Reddit, various online health communities, and Facebook.
Results: In the recent past, communicable diseases (e.g., influenza, dengue) have been the
focus of much social media-based NLP health research. However, mental health and substance
use and abuse (including the use of tobacco, alcohol, marijuana, and opioids) have
been the subject of an increasing volume of research in the 2016 - 2018 period. Associated
with this trend, the use of lexicon-based methods remains popular given the availability
of psychologically validated lexical resources suitable for mental health and substance
abuse research. Finally, we found that in the period under review “modern" machine
learning methods (i.e. deep neural-network-based methods), while increasing in popularity,
remain less widely used than “classical" machine learning methods.
Keywords
Natural Language Processing - text mining - social media - public health