Best Paper Selection
20 November 2018 (online)
Daniulaityte R, Chen L, Lamy FR, Carlson RG, Thirunarayan K, Sheth A. “When ‘Bad’ is ‘Good’”: Identifying Personal Communication and Sentiment in Drug-Related Tweets. JMIR Public Health Surveill 2016 Oct 24;2(2):e162
Although several studies have reported on the development of automated approaches to analyse tobacco and e-cigarette-related tweet content, and to identify adverse effects associated with the medical use of pharmaceutical drugs, there have been very few attempts to apply automated content analysis techniques to analyse drug abuse–related tweets. This lack of research is partially related to the fact that drug-related content adds another layer of ambiguity and difficulty in the development of automated techniques because of the pervasive use of slang terminology and implied meanings. For the words that suggest a particular sentiment, traditional approaches that use sentiment lexicons may not perform well, and machine learning techniques, trained using manually coded data, could increase the accuracy of sentiment identification in drug-related tweets. The purpose of this study was to describe the development and performance of machine learning classifiers to automatically identify tweets by the type of communication (personal, official/media, or retail) and sentiment (positive, negative, or neutral) expressed in cannabis- and synthetic cannabinoid–related tweets. To reach a sample size of 4,000 tweets for the manually-labelled data set for machine learning, more than 8,000 tweets were manually reviewed and filtered using QDA Miner. The tweets for manual coding were extracted from the pool of 15,623,869 tweets that were collected by eDrugTrends between May and November 2015. The sample of 4,000 manually-labelled tweets was split into two subsamples, 1,000 were used to train a source classifier, and 3,000 were allocated for sentiment classification. The most discriminative unigram and bigram features that were identified by chi-square test reflect thematic groups as pertinent to sentiment categories: “want,” “love,” “need” for positive, in contrast to “don’t,” “shit,” “fake” for negative tweets. But the sentiment classifier tended to incorrectly classify tweets that expressed an opposing opinion to negative thoughts or actions related to cannabis use or its legalization. Furthermore, humorous and sarcastic tweets were also more difficult to classify correctly by the classifier. The identification of sentiment in personal, user-generated tweets is more relevant for drug abuse epidemiology research than an approach that includes media- and business-related tweets.
Freedman RA, Viswanath K, Vaz-Luis I, Keating NL. Learning from social media: utilizing advanced data extraction techniques to understand barriers to breast cancer treatment. Breast Cancer Res Treat 2016 Jul;158(2):395-405
To date, most studies examining barriers to care for diverse populations have been conducted within registry- or claims-based cohorts. Additional smaller studies using surveys, focus groups, and medical records are often limited to a single geographic area or institution and may not necessarily generalize across diverse populations. Furthermore, most surveys have structured formats and are subject to recall bias. Social media has been recognized as a potential source of patient data often underrepresented in studies using conventional research methodologies, emerging thus as a rich, yet largely untapped, resource for understanding what patients are candidly saying about their experiences and treatments. The purpose of this study was to utilize machine learning to identify key issues and themes that patients with breast cancer were sharing online, focusing on the barriers to treatment. Postings from a 365-day period, ending on January 31, 2015, on message boards, blogs, topical sites, content sharing sites, and social networks were examined. 3,200,128 unique posts that discussed breast cancer were identified. The analyses were limited to the 1,024,041 (32 %) posts about treatment. When possible, a phase of treatment (pre-diagnosis, diagnosis, assessment, decision to treat, or treatment) was identified by tagging posts based on cues for a user’s current situation through topical keywords and relevant self-reported experiences yielding 627,381 posts. Among these posts, overarching themes and treatment barriers were assigned for 387,238 (62% of 627,381). Organizational barriers generally increased from pre-diagnosis (6% of posts) to diagnosis (13%) and remained high during assessment (28%), decisions to treat (21%), and treatment (29%). Sociocultural barriers decreased over the treatment trajectory (24% of posts in the pre-diagnosis phase to 18–20% of posts about treatments) as did psychological barriers (43% to 19–25%). Situational barriers remained relatively constant over the treatment trajectory and were reported in a quarter of posts. For emotional barriers, most conversations reported fears, anxiety, denial, and depression. The most common belief-related sentiments were spiritual/religious (41%), although other prominent themes included misinformation (30%) and preferences/perceptions (29%). The most common physical concerns expressed were side effects (40%), followed by physical limitations (31%) and body changes (29%). Resource concerns included posts about insurance (49%), costs (33%), and logistics of treatment (18%). Dominant concerns raised within posts about healthcare perception barriers included poor communication (36 %), trust (22%), accessibility of services (21%), and negative experiences (21%). Among posts related to relationship barriers, the most dominant issues included problems with intimacy (35%), friends (34%), and children (31%). Duration and process barriers were categorized as issues with the regimens prescribed (41%), duration of treatment (23%), effects of the after treatment (19%), and complexity of care (17%). In 9,465 posts, users suggested that they refused recommended treatments that were recommended for their breast cancer. With this new type of “social intelligence” for research, mining the vast repository of unstructured big data for insight into patients’ concerns and experiences, the authors learned about barriers to care for a large and diverse population of users.
Hawkins JB, Brownstein JS, Tuli G, Runels T, Broecker K, Nsoesie EO, McIver DJ, Rozenblum R, Wright A, Bourgeois FT, Greaves F. Measuring patient-perceived quality of care in US hospitals using Twitter. BMJ Qual Saf 2016 Jun;25(6):404-13
Experiences and perception of patients receiving healthcare as well as the necessity for healthcare stakeholders to measure and report outcomes are usually based on structured questionnaires. Limitations of these surveys include significant time lag between an outcome and a report of that outcome, and low response rates. As Twitter is actively used by one out of five adults, the authors sought to identify and analyse the content of posts sent to hospitals as a novel real-time measure of quality, supplementing traditional survey-based approaches. Hawkins, et al., assessed the use of Twitter as a supplemental data stream for measuring patient-perceived quality of care in US hospitals and for comparing patient sentiments about hospitals with established quality measures. A machine learning approach was used to classify tweets associated with patient experiences.
Of the tweets directed to 2,349 US hospitals having an account on Twitter, over the period 1 October 2012 to 30 September 2013, 404,065 were analysed. Sentiment of patient experience was calculated for these tweets using natural language processing (the open source Python library TextBlob). A total of 11,602 tweets were manually categorised into patient experience topics, including food, money, pain, general, room condition, time, communication, discharge, medication instructions, side effects. Finally, 297 hospitals, representing 111 unique Twitter accounts with at least 50 patient-experience tweets were surveyed to understand how they use Twitter to interact with patients. The authors focused on the percentage of patients who rated a hospital at the highest levels on a validated scale of quality of care. The second validated measure of quality of care was the Hospital Compare 30-day hospital readmission rate calculated from the period 1 July 2012–30 June 2013 (https://www.medicare.gov/hospitalcompare/search.html). Roughly half of the hospitals in the US have a presence on Twitter (50.2%). Of the 297 surveyed hospitals, half responded and all confirmed that they closely monitor social media and interact with users. Of the tweets directed toward these hospitals, 34,725 (9.4%) were related to patient experiences, covering diverse topics. The top three topics discussed were: time management, money concerns, and communication with staff. Analyses limited to hospitals with at least 50 patient-experience tweets revealed that they were more active on Twitter, more likely to be below the national median of Medicare patients (p<0.001) and above the national median for nurse/patient ratio (p=0.006), and to be a non-profit hospital (p<0.001). After adjusting for hospital characteristics, they found that Twitter sentiment was not associated with Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) ratings; however, having a Twitter account was associated with HCAHPS score, although there was a weak association with 30-day hospital readmission rates (p=0.003). The authors showed that monitoring Twitter provides useful, unsolicited, and real-time data that might not be captured by traditional feedback mechanisms. Tweets describing patient experiences in hospitals cover a wide range of patient care aspects and can be identified using automated approaches. The authors recommended that patients, researchers, and policy makers also attempt to utilise this data stream to understand the experiences of healthcare consumers and the quality of care they receive.
Kondylakis H, Koumakis L, Hänold S, Nwankwo I, Forgó N, Marias K, Tsiknakis M, Graf N. Donor’s support tool: Enabling informed secondary use of patient’s biomaterial and personal data. Int J Med Inform 2017 Jan;97:282-92
The purpose of this paper was to study the current practices for obtaining consent for biobanking and the legal requirements for reusing the available biomaterial and data in EU. The authors present a novel modular IT tool named “Donor’s Support Tool” in order to ensure that patients actively provide and update their consent according to applicable national laws, thus enabling the secondary use of data and biomaterial. The legal landscape for the secondary use of biomaterial and data in the European Union is complex. There is no harmonized European regulation that covers both the processing and use of biosamples and associated personal or clinical data at the same time. Different regimes apply to each EU member. At present, the use of personal data enjoys the more harmonized framework. Informed consent is one of the best-known elements of medical ethics and bioethics, and is widely utilized in clinical practice and clinical research. But there are various types of consent: the consents that applies to a specific purpose or research study, the consent that is partially restricted to a domain of purposes or types of research studies, the consent that is multi-layered, wherein consent can apply to a number of unnamed or unspecified purposes or studies, or the broad consent which applies to any purpose or research study, named or unnamed. In clinical trials, only the specific consent is allowed, while different approaches, ranging from specific to broad, or even simply ‘presumed’ consent, could be applied in the processing of human tissues among EU member states. Similarly, for personal data processing, multiple approaches could possibly apply in the EU member states. The EU Clinical Trial Regulation (EU No 536/2014, https://ec.europa.eu/health/human-use/clinical-trials/regulation_en) requires that consent for a participation in a clinical trial be in a written form. National data protection laws usually require an explicit and written consent for the processing of sensitive data, except in the UK and Austria where no specific formal requirement has been set up. Regarding the identification and the authentication of the consent subject, even if a qualified electronic signature is desired, the usage of such signatures is not widespread among the European population. Transforming the legal requirements into information technology requirements, the authors designed and implemented the IT platform enabling citizens to actively provide and update their consent in real time. The three modules (personal information management system, donor’s generation module, and donor’s decision module) place participants at the heart of decision-making and allow individuals to tailor and manage their own consent preferences. Comparisons with six other relevant approaches are provided: SecureConsent, Mytrus, Educonsent, iMed-Consent, FORCS e-consent and Consentir. The system was also tested by the University College of London using retrospective data.
Massey PM, Leader A, Yom-Tov E, Budenz A, Fisher K, Klassen AC. Applying Multiple Data Collection Tools to Quantify Human Papillomavirus Vaccine Communication on Twitter. J Med Internet Res 2016 Dec 5;18(12):e318
The purpose of this study was to quantify Human Papilloma Virus (HPV) vaccine communication on Twitter, specifically focusing on (1) sentiment, (2) side effects, and (3) prevention and protection, and to describe a novel methodology using two data collection methods to analyse Twitter data. Two methods were used to collect and validate Twitter data related to HPV vaccination. From August 1, 2014 to July 31, 2015, 305,517 and 258,102 tweets were collected respectively using a prospective or a retrospective data collection method. Only English-language tweets were included. A corpus of 1,470 manually coded tweets was used to develop a machine learning classifier for each variable in the codebook. Binary variables were classified using a linear classifier (Moore-Penrose pseudoinverse), while a decision tree was applied to variables with more than two categorical responses. A total of 193,379 English-language tweets were collected, classified, and analysed between August 1, 2014 and July 31, 2015. Over 88.64% (191,515/216,060) of the final dataset included the keyword search term HPV, and nearly 34.91% (75,433/216,060) included HPV vaccine. Associated words varied with each keyword, with HPV being associated with personal words such as “I”, “me”, and “have”, and #HPV being associated with January (cervical cancer awareness month), prevent, and learn. Positive sentiment toward the vaccine was the largest type of sentiment in the sample, with 75,393 positive tweets (38.99% of the sample). Many more users participated in positive sentiment than in negative sentiment (36,283 vs 24,010 users, respectively). There is also an important relationship between tweet sentiment and tweet content: many more tweets that were classified as positive mentioned information about prevention or protection, whereas tweets classified as negative included a much greater discussion about side effects. This can be important information for health promotion and communication campaigns, specifically in terms of tailoring a message and joining a particular conversation.