How to Exploit Twitter for Public Health Monitoring?

K. Denecke; M. Krieck; L. Otrusina; P. Smrz; P. Dolog; W. Nejdl; E. Velasco

doi:10.3414/ME12-02-0010

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

Methods Inf Med 2013; 52(04): 326-339
DOI: 10.3414/ME12-02-0010

Focus Theme – Original Articles

Schattauer GmbH

How to Exploit Twitter for Public Health Monitoring?

Authors

K. Denecke

¹Innovation Center Computer Assisted Surgery, Leipzig, Germany

⁵Forschungszentrum L3S, Hannover, Germany
M. Krieck

²Niedersächsisches Landesgesundheitsamt, Hannover, Germany
L. Otrusina

³Brno University of Technology, Brno, Czech Republic
P. Smrz

³Brno University of Technology, Brno, Czech Republic
P. Dolog

⁴Aalborg University, Aalborg, Denmark
W. Nejdl

⁵Forschungszentrum L3S, Hannover, Germany
E. Velasco

⁶Robert Koch Institut, Berlin, Germany

Further Information

Publication History

received: 07 November 2012

accepted: 02 May 2013

Publication Date:
20 January 2018 (online)

Permissions and Reprints

Summary

Objectives: Detecting hints to public health threats as early as possible is crucial to prevent harm from the population. However, many disease surveillance strategies rely upon data whose collection requires explicit reporting (data transmitted from hospitals, laboratories or physicians). Collecting reports takes time so that the reaction time grows. Moreover, context information on individual cases is often lost in the collection process. This paper describes a system that tries to address these limitations by processing social media for identifying information on public health threats. The primary objective is to study the usefulness of the approach for supporting the monitoring of a population's health status.

Methods: The developed system works in three main steps: Data from Twitter, blogs, and forums as well as from TV and radio channels are continuously collected and filtered by means of keyword lists. Sentences of relevant texts are classified relevant or irrelevant using a binary classifier based on support vector machines. By means of statistical methods known from biosurveillance, the relevant sentences are further analyzed and signals are generated automatically when unexpected behavior is detected. From the generated signals a subset is selected for presentation to a user by matching with user queries or profiles. In a set of evaluation experiments, public health experts assessed the generated signals with respect to correctness and relevancy. In particular, it was assessed how many relevant and irrelevant signals are generated during a specific time period.

Results: The experiments show that the system provides information on health events identified in social media. Signals are mainly generated from Twitter messages posted by news agencies. Personal tweets, i.e. tweets from persons observing some symptoms, only play a minor role for signal generation given a limited volume of relevant messages. Relevant signals referring to real world outbreaks were generated by the system and monitored by epidemiologists for example during the European football championship. But, the number of relevant signals among generated signals is still very small: The different experiments yielded a proportion between 5 and 20% of signals regarded as “relevant” by the users. Vaccination or education campaigns communicated via Twitter as well as use of medical terms in other contexts than for outbreak reporting led to the generation of irrelevant signals.

Conclusions: The aggregation of information into signals results in a reduction of monitoring effort compared to other existing systems. Against expectations, only few messages are of personal nature, reporting on personal symptoms. Instead, media reports are distributed over social media channels. Despite the high percentage of irrele vant signals generated by the system, the users reported that the effort in monitoring aggregated information in form of signals is less demanding than monitoring huge social-media data streams manually. It remains for the future to develop strategies for reducing false alarms.

Keywords

Textmining - Web science - public health - population surveillance - epidemic intelligence - Medicine 2.0

References
1 Paquet C, Coulombier D, Kaiser R, Ciotti M. Epidemic intelligence: a new framework for strengthening disease surveillance in Europe. Euro Surveill 2006; 11 (12) 212-214.

PubMed Search in Google Scholar
Download RIS citation
2 Denecke K, Brooks E. Webscience in Medicine and Healthcare. Methods Inf Med 2013; 52 (02) 148-151.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
3 Wu H, Fang H. Exploiting Online Discussions to Discover Unrecognized Drug Side Effects. Methods Inf Med 2013; 52 (02) 152-159.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
4 Chomutare T, Årsand E, Luque L, Lauritzen J, Hartvigsen G. Inferring Community Structure in Healthcare Forums: an Empirical Study. Methods Inf Med 2013; 52 (02) 160-167.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
5 Konstantinidis S, Luque L, Bamidis P, Karlsen R. The role of Taxonomies in Social Media and Semantic Web for Health Education: A study for SNOMED CT terms in YouTube Health Video tags. Methods Inf Med 2013; 52 (02) 168-179.

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
6 Collier N. Uncovering text mining: A survey of current work on web-based epidemic intelligence. Global Public Health. 2012: 731-749.

Search in Google Scholar
Download RIS citation
7 Linge JP, Steinberger R, Fuart F. et al MedISys - Medical Information System. In Bessis N, Asimakopoulou E. (eds.) Advanced ICTs for Disaster Management and Threat Detection: Collaborative and Distributed Frameworks. IGI Global Press; 2010: 131-142.

Search in Google Scholar
Download RIS citation
8 von Etter P, Huttunen S, Vihavainen A, Vuorinen M, Yangarber R. Assessment of utility in web mining for the domain of public health. Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents. 2012: 29-37.

Search in Google Scholar
Download RIS citation
9 Dugas AF, Hsieh Y-H, Levin SR. et al Google Flu Trends: Correlation with emergency department influenza rates and crowding metrics. Clin Infect Dis 2012; 54 (04) 463-469.

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Cook S, Conrad C, Fowlkes AL, Mohebbi MH. Assessing Google Flu trends performance in the United States during the 2009 influenza virus A (H1N1) pandemic. PLoS One 2011; 6 (08) e23610

Crossref PubMed Search in Google Scholar
Download RIS citation
11 Carneiro HA, Mylonakis E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin Infect Dis 2009; 49 (10) 1557-1564.

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Eysenbach G. Infodemiology and Infoveillance: Tracking Online Health Information and Cyberbehaviour for Public Health. American Journal of Preventive Medicine 2011; 40 (Suppl. 05) Suppl. 2 154-158.

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Madoff LC. Promed-mail: An early warning system for emerging disease. Clin Infect Dis 2004; 39 (02) 227-232.

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Global Public Health Intelligence Network, GPHIN. http://www.phac-aspc.gc.ca/gphin/. (last access: 24.04.2013)

Download RIS citation
15 BioCaster Global Health Monitor. http://born.nii.ac.jp/. (last access: 24.04.2013)

Download RIS citation
16 HealthMap. http://www.healthmap.org/. (last access: 24.04.2013)

Download RIS citation
17 Collier N. et al BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics 2008; 24 (24) 2940-2941.

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Steinberger R, Fuart F, van der Groot E, Best C, von Etter P, Yangarber R. Text mining from the web for medical intelligence. Mining Massive Data Sets for Security 2008; 19: 295-310.

Search in Google Scholar
Download RIS citation
19 Grishman R, Huttunen Y, Yangarber R. Information extraction for enhanced access to disease outbreak reports. J of Biomedical Informatics 2002; 35 (04) 236-246.

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Keller M, Blench M, Tolentino H. et al Use of unstructured event-based reports for global infectious disease surveillance. Emerg Infect Disease 2009; 15 (05) 689-695.

Search in Google Scholar
Download RIS citation
21 Hartley D. et al The landscape of international biosurveillance. Emerg Health Threats J. 2010: 3 (published online)

Search in Google Scholar
Download RIS citation
22 Carneiro HA, Mylonakis E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin Infect Dis 2009; 49 (10) 1557-1564.

Crossref PubMed Search in Google Scholar
Download RIS citation
23 Corley CD, Cook DJ, Mikler AR, Singh KP. Text and Structural Data Mining of Influenza Mentions in Web and Social Media. Int J Environ Res Public Health 2010; 7 (02) 596-615.

Crossref PubMed Search in Google Scholar
Download RIS citation
24 Bilge U, Bozkurt S, Yolcular BO, Ozel D. Can social web help to detect influenza related illnesses in Turkey?. Stud Health Technol Inform 2012; 174: 100-104.

PubMed Search in Google Scholar
Download RIS citation
25 Chan EH, Sahai V, Conrad C, Brownstein JS. Using web search query data to monitor dengue epidemics: A new model for neglected tropical disease surveillance. PLoS Negl Trop Dis 2011; 5 (05) e1206 doi:10.1371/journal.pntd.0001206

Crossref PubMed Search in Google Scholar
Download RIS citation
26 Backfried G, Schmidt C, Pfeiffer M, Quirchmayr G, Glanzer M, Rainer K. Open Source Intelligence for disaster Management. Intelligence and Security Informatics Conference (EISIC). 2012: 254-258.

Search in Google Scholar
Download RIS citation
27 Multilingual cross-domain temporal tagger HeidelTime. http://dbs.ifi.uni-heidelberg.de/heideltime. (last access: 24.04.2013)

Download RIS citation
28 Stanford Named Entity Recognizer. http://nlp.stanford.edu/software/CRF-NER.shtml. (last access: 24.04.2013)

Download RIS citation
29 Stewart A, Smith M, Nejdl W. A transfer approach to detecting disease reporting events in blog social media. HT 2011. 2011: 271-280.

Search in Google Scholar
Download RIS citation
30 Moschitti A, Pighin D, Basili R. Semantic Role Labeling via Tree Kernel joint inference. In: Proceedings of the 10th Conference on Computational Natural Language Learning. New York, USA: 2006.

Search in Google Scholar
Download RIS citation
31 Höhle M. Surveillance: An R package for the surveillance of infectious diseases. Computational Statistics 2007; 22 (04) 571-582.

Crossref Search in Google Scholar
Download RIS citation
32 Rossi G, Lampugnani L, Marchi M. An approximate CUSUM procedure for surveillance of health events. Statistics in Medicine 1999; 18: 2111-2122.

Crossref PubMed Search in Google Scholar
Download RIS citation
33 Farrington P, Andrews N, Beale A, Catchpole M. A statistical algorithm for the early detection of outbreaks of infectious disease. J R Statist Soc A 1996; 159: 547-563.

Crossref Search in Google Scholar
Download RIS citation
34 Lage R, Durao F, Dolog P. Towards Effective Group Recommendations for Microblogging Users. Proceedings of the ACM Symposium on Applied Computing, SAC. Italy: 2012: 923-928.

Search in Google Scholar
Download RIS citation
35 Leginus M, Dolog P, Zemaitis V. Improving tensor based recommenders with clustering. In: Proceedings of the 20th conference on User Modeling, Adaptation, and Personalization UMAP 2012. 2012.

Search in Google Scholar
Download RIS citation
36 Gottron T. Document Word Clouds: Visualising Web Documents as Tag Clouds to Aid Users in Relevance Decisions. In Agosti M. et al (eds.) ECDL 2009, LNCS 5714. Berlin, Heidelberg: Springer; 2009: 94-105.

Search in Google Scholar
Download RIS citation

Related Journals

Subscribe to RSS

Share / Bookmark

How to Exploit Twitter for Public Health Monitoring?

Authors

Publication History

Summary

Keywords

References