Keywords
Environmental exposure - health informatics - precision medicine
1 Introduction
Most diseases result from the complex interplay between genetic and environmental
factors. The exposome can be defined as a systematic approach to acquire large data
sets corresponding to environmental exposures of an individual along her/his life[1] and associate them with specific health and disease status[2]. In its broadest sense, the exposome encompasses not only exposures to environmental
stressors, but also the physical environment, socio-economic factors, the built environment[1], aspects related to access to health care, and individual life habits or behaviours[3].
The health informatics community has only recently become aware of the importance
of collecting environmental data in order to understand an individual's health[4]. There is a need for new digital methods and resources that select, annotate, organize,
and present reliable and updated information about environmental factors affecting
our health on both the population and individual/ patient scales. The exposome demands
a systematic research effort equivalent to what has been done to characterize the
human genome (and also the human phenome)[5]
[6]. Precision medicine has explicitly acknowledged the need to acquire and integrate
individual-level genetic, environmental and clinical data to achieve a better understanding
of multifactorial diseases and for developing new preventive, diagnostic, and therapeutic
solutions, adapted to groups of individuals with similar risk factors[7].
To address these problems, the International Medical Informatics Association (IMIA)
created in August 2017 in Hangzhou (China) a Working Group on informatics aspects
related to the exposome to support investigators, clinicians, and consumers navigate
throughout the entire “data to knowledge” lifecycle: data collection, knowledge representation,
annotation, integration with genomic and phenomic data, analytics, and visualization.
This contribution summarizes the main findings after a panel organized by the IMIA
- Exposome Informatics Working Group held during the last MEDINFO, in Lyon (France)
in August 2019. This panel was a follow-up, updating a very successful and well-attended
panel, held at MIE 2018 in Gothenburg, Sweden.
The objective of this panel was to raise awareness within the health informatics community
about opportunities for research in this area, in the context of precision medicine
informatics. With that purpose, outstanding members of our community presented four
on-going research projects (PULSE, Digital exposome, Cloudy with a chance of pain, Wearable clinics), providing a very detailed and complete view of current challenges and accomplishments
in processing environmental and social data from a health research perspective. The
four projects illustrate a wide range of research methods, digital data collection
technologies, and analytics and visualization tools. This reinforces the idea that
this area is now ready for health informaticians to step in and contribute their expertise,
leading the application of informatics strategies to environmental health problems.
2 Participatory Urban Living for Sustainable Environments (PULSE)
2 Participatory Urban Living for Sustainable Environments (PULSE)
Participatory Urban Living for Sustainable Environments (PULSE) is a project funded
by the European Commission within the Horizon 2020 program, with the final goal to
support cities in planning and implementing innovative public health policies in the
urban environment relying on an innovative “big data”-enabled information system[8].
The PULSE technological architecture is based on the integration of two main components:
i) a participatory data collection system, made of an app called “PulseAir”, which
allows voluntary citizens to provide information about their health, lifestyle, and
exposure patterns by merging questionnaire data and wearable/home sensors signals;
ii) a data analytics and decision support system made available to health care authorities
and city planners, where different data sources, including data collected by PulseAir
and data about health care and environment, are integrated and visualized. A peculiar
aspect of the entire system is represented by the spatial enablement of the data analytics
platform, i.e., the ability to add a geographic reference to data and information.
This allows designing the two main components with geo-referenced components, such
as maps, webGIS, and geo-analytics dashboards[9].
In terms of disease risk assessment and prevention, PULSE focuses on the link between
air pollution and asthma, and on the relationships between physical inactivity and
type 2 diabetes. In both cases, health risk is seen as a combination of environmental
and social exposures (e.g., air pollution, poverty) and human behaviour (e.g., a sedentary
lifestyle). For this reason, PulseAir collects data to profile citizens in terms of
their health risk, specifically focusing on the combination of exposome and phenome
information.
Currently, the PULSE project is in its final stages of development. After the collection
of data coming from more than 1500 citizens in different cities, the final version
of the overall architecture is under development, and its release to the participating
cities is foreseen in April 2020. In this way, each city will be able to launch Public
Health Observatories (PHOs), which will be exploited to integrate, analyse, and visualize
data to inform health policy decisions[10].
As exposomics is concerned, several research activities have been carried on within
the PULSE project. In the following we briefly describe two of them.
2.1 Personal Exposure
One of the hottest topics in exposomics is the influence of air quality on health.
Evaluating the individual exposure of each citizen to pollutants is a key issue. However,
current technological solutions exploited by cities are unable to measure air quality
with the spatial and time resolution needed for this purpose. Typically, environmental
agencies install a very limited number of air quality monitors. As an example, New
York City (NYC) has 18 stations for an area of 1034 km2, i.e., a density of one station per 57 km2, and the stations do not measure all pollutants. If we limit the analysis to the
most important air quality pollutant, e.g., PM 2.5 (Particulate Matter), NYC has only
9 stations. PM personal exposure, expressed as the estimate of PM inhaled in one day
by a given citizen, is currently only very roughly evaluated in current settings in
cities, by combining PM estimates and mobility patterns (extracted for example by
GPS positioning). Within the PULSE project, we are experimenting different strategies
to evaluate personal exposure. On the one hand, we are combining the information coming
from city stations with data collected by low cost sensors that can be installed in
patients’ homes. In the “city lab” of Pavia, we have installed 42 low cost Purple
Air PA-II sensors, increasing the density to about 1.5 station per km2. Thanks to the PulseAir application it is possible to estimate the inner-city trajectory
of each participating citizens, and thanks to the activity tracker it is possible
to derive their heart rate with high temporal resolution. The combination of this
data largely increases the precision in the estimate of the exposure to all the pollutants
measured by the sensors. On the second hand, we are also testing “portable” air quality
sensors, i.e. sensors that can be worn by citizens when they move in the city, such
as the Dunavnet DV800 sensor. Even if those types of sensors cannot yet be worn in
every-day life, they can be used to assess in one day air quality in many city areas
with high spatial precision. Dunavnet has been tested in New York, Barcelona, and
Birmingham.
2.2 Transfer Learning
Deep and transfer learning to support image analytics have been exploited within PULSE
to investigate the relationships between the urban landscape of city areas and well-being
and healthcare indexes of citizens[11]. The goal of this activity is to design a tool for health care planners and decision-makers
to find clusters of city areas that share similar urban structures and similar health
care indexes, in order to plan “cluster-based” interventions, rather than relying
only on the geographical locations of areas. In order to demonstrate the feasibility
of this approach, images and health care data coming from New York City have been
analysed.
In particular, an image collected by the “The National Agriculture Imagery Program”
(NAIP), which acquires aerial imagery during the agricultural growing seasons in the
continental United States, has been retrieved and processed to be subdivided into
image square blocks of 512 meters edge. Moreover, health care data have been collected
from the 500 Cities project repository. The 500 Cities project is a collaboration
between the Centers for Disease Control and Prevention (CDC), the Robert Wood Johnson
Foundation, and the CDC Foundation. The project provides city-and census tract-level
small area estimates of many health-care indexes for the largest 500 cities in the
United States. NYC is divided in 2166 census tracts. We have exploited 2017 measures
about major risk behaviours that lead to illness, pain, and early death, as well as
the conditions and diseases that are the most common, costly, and preventable of all
health problems.
A mapping between each image and health care data has been performed, in order to
provide an estimate of health care indexes for each image square block. Then, each
image has been processed with the “Painters” deep neural networks made available in
the Orange software[12]. The latent variables extracted by the deep network have been used to cluster images.
Finally, health care indexes of each image have been correlated with the clusters,
showing that clusters (which were derived only on the basis of the urban images) can
be predicted on the basis of health care data. This data analytics pipeline showed
that it was thus possible to correlate urban landscape with healthcare indicators
at the whole city level. In the NYC case, such correlation looks particularly relevant,
probably because of social factors, which, in the US society and in large cities,
make health indicators related to the urban areas where people live.
The work carried on, while demonstrating that deep neural networks designed to encode
image data can be successfully reused within transfer learning approaches, also shows
that it is possible to profile city areas, and that the urban structure is a component
of the exposome of citizens.
3 Cloudy with a Chance of Pain
3 Cloudy with a Chance of Pain
The weather has been thought to influence health for millennia[13]. One of the best-known beliefs is that the weather influences pain in people living
with arthritis and other long-term pain conditions. Indeed, around 80% of people living
with arthritis believe in such an association, and around half believe they can forecast
the weather based on their symptoms.
Many researchers have tried to study this association, but results have been inconclusive[14]. Some of the limitations include small sample sizes or short durations of follow-up.
Another important limitation is the quality of the data to support the analysis, including
regular information about pain or other symptoms, and high-quality information about
the weather to which research participants are exposed. Many studies assume that patients
stay at their postcode, or within their town or region. However, people are mobile
and it would be helpful to link their moving geolocation to the local weather.
Smartphones offer new opportunities for conducting health research at scale[15]. This includes tracking symptoms on a more frequent basis, integrated into participants’
daily live, as well as making use of sensor data from within the smartphone. Cloudy
with a Chance of Pain is a national UK smartphone study designed to address the age-old
question of weather and pain[16]. The study recruited over 13,000 participants in 2016 and asked them to record their
daily symptoms for up to six months, while the phone's GPS was automatically recording
the local weather data. Ultimately, the study collected 5.1 million symptom scores
with daily weather data accessible from the 154 UK Met Office weather stations. The
analysis used a case-crossover design to compare the weather at times of increased
pain to ‘control periods’ within the same month when pain was not increased. The results
demonstrated that high relative humidity, stronger winds, and low pressure were associated
with more painful days[16].
In addition to addressing the primary question of weather and pain, this study demonstrated
how the use of consumer devices could support novel health research at scale. Thinking
specifically about the exposome, the study allowed participants to move around the
country and still provide accurate local weather data. The average user was linked
to 9/154 possible UK weather stations (interquartile range 4-14)[16]. The most mobile person required data from 82/154 weather stations, indicating how
mobile participants were during the course of the study and the importance of accounting
for mobility and local weather data over the course of the study. Similar methods
could be used for future studies that need to gather regular self-reported information
alongside accurate exposome data, using the smartphone GPS linked to appropriate datasets
to define exposure.
4 The Digital Component of the Exposome
4 The Digital Component of the Exposome
The exposome has traditionally been associated with physical or socio-economical environmental
factors that are specific for an individual at a given moment. Our current society
lifestyle is increasingly reliant of the use of the internet and digital tools affecting
almost every single aspect of our lives. This has therefore increased the amount of
time spent online by citizens using different platforms such as social networks[17]. These online and digital interactions drove the definition of new elements that
could be related with health aspects and the creation of new concepts such as the
“Digital pheno-type”[18] that was coined in 2015 to refer to the digital footprint of an individual that
could eventually be related with a disease and is related with the development of
the idea of defining digital biomarkers referring to objective and quantifiable data
measured by digital devices[19]. However, the aforementioned concepts are related with the tools used to measure
digital responses of individuals to different stimuli. On the other hand, the digital
world represents a rich and new environment to which individuals are increasingly
exposed and that may have health consequences.
Drugs and chemicals are paradigmatic examples of relevant factors considered to be
an important element of the exposome. In a similar fashion the digital environment
offers the opportunity to identify digital analogies and similarities to those well
recognised components of the exposome. Drugs in the physical environment have different
health effects ranging from addiction disorders to treatment of different conditions.
In the digital environment it is possible to identify similar examples of how digital
exposures may have different health effects. Internet addiction disorder would be
an example of how an inadequate use of digital tools and contents may lead to an addiction
disorder. On the other hand, digital exposures have also been applied to treating
different health conditions, particularly in the mental health domain with the development
of computer-based cognitive behavioural therapy (CBT). In this regard, a clinical
trial was published in 2017 comparing the use of virtual reality exposures and in
vivo exposures in CBT showing that the virtual exposure was effective[20]
[21]. These examples show the necessity to consider the relevance of digital exposures
as another intrinsic element of the exposome. For this reason, in 2017 the concept
of the digital exposome or the digital component of the exposome as “the whole set
of tools and platforms (including contents) that an individual use and the activities
and processes that an individual engage with as part of his digital life”[22] was introduced reflecting on the relevance of this element in the broader definition
of the exposome and complementing the already existing concepts of digital phenotypes
and biomarkers.
Biomedical informatics has over the years developed a vast array of methodologies
and tools in areas such as natural language processing, self-monitoring, or participatory
health that might be applied to characterise components of the digital exposome and
may capture the digital activity of an individual. However, this very possibility
of tracking the digital activities of the individuals represents a series of important
challenges that range from how these data will be analysed in a longitudinal manner,
how or where they will be stored, and how they will be shared. Ethics represents a
major challenge in this scenario where a 24/7 surveillance of individuals is technically
feasible and should be carefully considered for practical implementations of the digital
exposome.
5 Wearable Clinics
People with long-term conditions (LTCs) typically interact with healthcare services
through rigid pathways that poorly match the dynamic nature of their condition[23]. On the one hand, clinic visits may be unnecessary during times of remission, thus
wasting the time and resources of patients and professionals. On the other hand, ‘one
size its all’ care is seldom timely or specific enough to arrest relapses before they
lead to serious exacerbations or costly hospitalisations. “Wearable Clinics”[24]
[25] are a vision for digitally transformed healthcare services that enable new forms
of collaborative care for LTC management through dynamic personal care plans that
adapt to the changing state of the individuals and the world around them. The aim
is to empower patients to become managers of their own care through actionable care-planning
information and mobile/ wearable technologies.
The development of Wearable Clinics is characterized by a number of engineering and
translational research challenges: 1) The design of multimodel, adaptive sensing,
and signal compression algorithms for high-resolution wearable sensing data, minimising
communication demand, and maximising sensor operating lifetime; 2) The integration
of passive wearable sensing modalities (e.g. accelerometer, GPS, heart rate) with
active mobile sensing modalities (e.g. ecological momentary assessments through smart-phones[26]); 3) The real-time prediction of disease relapse risks based on integrated data
from electronic health records, passive wearable sensing, and active mobile sensing;
4) The adaptive, personalised care planning that takes into account predicted risks,
individual health and care goals, and available care resources in the patient's specific
environment; and 5) Support future real-world deployment through the assessment of
user acceptability, potential health and economic benefits, patient safety and data
security risks, and regulatory challenges associated with clinical deployment. These
challenges highlight the highly transdisciplinary nature of Wearable Clinic research
and engineering: it requires the involvement of bioengineers, computer scientists,
statisticians, artificial intelligence researchers, software engineers, health psychologists,
health economists, and patient safety experts.
The Universities of Manchester and York are currently developing prototypes of Wearable
Clinics for severe mental illnesses (in particular, schizophrenia) and ambulatory
blood pressure monitoring (ABPM). Both wearable clinics consist of a patient-facing
mobile app with associated sensing modalities and a clinic-facing web dashboard that
summarizes patients’ status in real time. The wearable clinic app for schizophrenia
integrates short smartphone questionnaires for psychosis symptom assessment[27]
[28]
[29], GPS sensing for behavioural phenotyping[30]
[31]
[32]
[33], and real-time risk assessment using cluster Hidden Markov models[34]. The wearable clinic for ABPM is an example of multimodal wearable sensors integration
and uses activity classification from accelerometer data[35]
[36] to determine the optimal moments for blood pressure measurement[37].
6 Conclusions
This panel had a very special meaning. First, it was held in Lyon (France), the city
where the International Agency for Research on Cancer is located. The former director
(2009-2019) of this
WHO institute, Dr. Christopher Wild, had coined the term exposome in 2005, when he
was director of this centre. As he stated in[1]: “There is a desperate need to develop methods with the same precision for an individual's
environmental exposure as we have for the individual's genome. I would like to suggest
that there is need for an “exposome “ to match the “genome.” Second, according to our knowledge, this was the first panel dedicated to the subject
of exposome informatics in a MEDINFO conference.
The four initiatives presented during the panel offered a very rich overview of the
expanding range of applications that informatics is finding in the field of environmental
health, with a potential impact on precision medicine. [Table 1] summarizes the main aspects from these four projects, looking at different aspects
related to their scope, data processing particularities, disease and environmental
factors involved, and Final users.
Table 1
Summary of the four projects according to several dimensions related to scope, goal,
data processing methods, disease, and risk factors
Project
|
Objective
|
Final users
|
Diseases
|
Social and environmental factors
|
Data collection methods
|
Data processing methods
|
Data analytics and visualization methods
|
Study sample size
|
Other relevant info
|
PULSE
|
To improve urban environment. - Public health observatory
|
Health care authorities and city planners
|
Asthma, Type 2 Diabetes
|
Poverty, air quality, physical activity and heart rate
|
PulseAir app with questionnaire and wearable. Environmental sensors. Portable air
quality sensors.
|
Geo-reference, data integration, maps, WebGIS
|
Deep and transfer learning, image analytics, decision support, geo-analytics dashboards
|
1500
|
Participatory data collection system, patient individual risk
|
Cloudy with a chance of pain
|
To examine the association between weather and pain
|
The public, specifically people living with pain
|
Arthritis, long-term pain conditions
|
Weather (humidity, low pressure, strong winds)
|
Smartphone app, Smartphone GPS, 5.1 million symptom scores
|
User data linked to 154 UK Met office weather stations
|
Average user linked to 9 weather stations, the most mobile person linked to 82 weather
stations
|
13,000
|
Case-crossover design
|
Digital Exposome
|
To characterize the health impact of Internet use and other digital technologies
|
Researchers, bioethics experts
|
Addictive behaviour, depression and other mental health problems
|
Use of Internet and digital technologies
|
Smartphone apps, Questionnaires, Electronic health records
|
Digital biomarkers, digital therapeutics
|
Individual digital footprints
|
n/a
|
Digital exposures as risk factors, but also as possible therapies
|
Wearable Clinics
|
To develop digitallytransformed healthcare services
|
Clinicians and patients
|
Long-term conditions, chronic diseases, schizophrenia, hypertension
|
Mobility, other wearable sensor data
|
Electronic health records, active mobile sensing (EMA), passive wearable sensing
|
Multimodel adaptive sensing, signal compression algorithms
|
Dynamic and personal healthcare plans, real-time prediction of disease relapse risk,
clinic dashboard, cluster hidden Markov models
|
21 (first phase)
|
Patient empowerment, regulatory challenges
|
In terms of their objectives, the featured projects illustrate applications that use
data on exposure to environmental and social risk factors for the investigation of
the causes of diseases, health care, patient empowerment, and public health. Therefore,
the final users of these systems can be researchers, clinical professionals, health
planners and public health, and urban planning authorities, including the participation
of experts in bioethics, given the challenges posed by some of these projects in terms
of data privacy and security.
The studies cover aspects related to long-term, chronic, diseases and conditions,
(pain, asthma, arthritis, mental health problems, hypertension), which is logical
considering that it is in these pathologies that the influence of environmental factors
is most important and still not well determined in many cases.
The authors have worked with a wide range of data that inform about individual's exposure
to environmental and social factors that can affect health, specifically, air quality,
physical activity patterns and mobility, weather, the use of digital technology and
poverty. Without being exhaustive, of course, the studies included here offer very
representative examples of the challenges that researchers face when processing this
type of data.
In terms of data processing, the authors addressed several of the main aspects related
to data collection, data processing and integration with different data sources, as
well as the most appropriate methods for their analysis, visualization, and application
to health problems. Thus, in terms of data collection, the projects faced problems
derived from the use of smartphones apps, wearables, Fixed and mobile environmental
sensors, questionnaires, and electronic medical records. These data had to be integrated
in some cases with other data sources, for example, those coming from meteorological
stations. Various methods were used for data management with a strong geographical
component (maps, geo-reference, webGIS), as well as for data preparation (signal compression
algorithms). Finally, when analysing these types of data, logically, techniques and
methods of artificial intelligence played a key role (deep and transfer learning),
but the authors want also to highlight the development of dashboards for patients,
clinicians and health authorities, personal health plans, and tools for real-time
risk prediction.
These studies illustrate the need to move forward in participatory data collection
approaches (person-generated health data), new research designs, biomarker and digital
therapies characterization, patient empowerment and regulatory aspects.
We hope that this collection of projects, diverse but with common aspects and objectives,
will serve the reader to assess the importance of addressing studies that integrate
genetic, clinical, and environmental information, all necessary for the development
of precision medicine. Our working group invites the health informatics community
to participate in on-going discussions and help shape the research agenda promoting
the use of informatics and data science in exposome and environmental health research,
particularly in the context of precision medicine.