Keywords
Polysomnography - Technology - Reproducibility of results - Sleep deprivation - Monitoring
Introduction
The measurement and quantification of sleep in population research and clinical settings
is of increasing importance due to its integral role in physical and mental health.[1] Diverse methods of monitoring and researching sleep have been extensively investigated
and validated in the literature.[2] Although polysomnography (PSG) is regarded as the ‘gold standard’ of sleep measurement,
it is a somewhat intrusive and expensive form of assessment.[3]
[4] Additionally, PSG typically requires an individual to sleep in an unfamiliar laboratory-based
clinic while sleep is being assessed via the use of multiple electrodes to monitor
neurophysiological and cardiorespiratory variables, which may be difficult and invasive
for many individuals and may compromise the ecological validity of the data attained
outside of a strictly pathological sleep assessment.[2]
[5] Over the last decade, many emerging sleep-monitoring devices, such as commercially-available
wearables, have demonstrated promising capability for tracking sleep and wake episodes.[1] A popular method of minimally-invasive sleep monitoring is via wrist actigraphy,
where wearable devices allow for continuous monitoring of sleep movement during sleep
with either automatic or manual-scoring options available.[3]
Although various products are now available on the market, actigraphy generally involves
a device being housed in a wristwatch, bracelet, or ring that contains an accelerometer
capable of sensing movement along each of three axes.[1]
[6] The tri-axial accelerometer samples multiple times per second and with each limb
movement, the accelerometer estimates metrics of sleep and wake including total sleep
time (TST), total time in bed (TIB), sleep efficiency (SE%), wake after sleep onset
(WASO), and sleep onset latency (SOL), as well as sleep and wake times. Data are then
stored in the device memory to be downloaded and either automatically or manually
scored.[1]
The advantage of actigraphy over traditional PSG is that it can record continuously
for 24 hours a day for days, weeks, or even longer,[7] and can easily be utilized to monitor sleep-wake patterns in home-based or ecologically
valid settings. As a result, it has also been proposed that actigraphy could be adapted
for use in primary care settings to improve sleep health in the community.[8] To date, actigraphy has been used widely in sleep research to provide continuous
monitoring of rest/wake activity rhythms in varying environments; including residential
care patients,[9] elite athletes,[10] shift workers,[11]
[12] and in operational settings such as firefighting[13] and the military.[14]
Manually scored actigraphy has historically been used in sleep research settings.[15]
[16] Despite this, it can be difficult to make conclusions on the overall reliability
of manually-scored actigraphy data given variations in methods of scoring, different
brands of hardware, varying software, and inter/intra-scorer reliability.[15] Studies have shown that high inter-rater agreement for manually scored data (e.g.,
α = 0.975 for rest onset, and α = 0.998 for rest offset) can be achieved with clearly
defined scoring criteria by trained researchers.[17] However, a limitation of manually scored actigraphy is the possibility of human
error and the time requirement of analyzing large groups of participant data.[5]
[7] Recent advances in technology have seen the emergence of automatically-scored, commercially-available
actigraph devices[5]
[14] and the accuracy and relability of these devices has improved considerably.[1]
[18] These developments include devices specifically tailored to detect periodic limb
movements and the introduction of new algorithms.[19] Most sleep-wake scoring algorithms are based on a combination of linear compilations
of activity levels (in predefined windows around the scored minute) and smoothing
or other logical decisions.
Many of the commercial wearable devices on the market, such as the Readiband™ (RB),
contain tri-axial accelerometers that record the frequency and intensity of limb movement
that can be converted to sleep-wake periods using a built-in automated scoring algorithm.
In a study by Chinoy et al.[1] evaluating seven consumer sleep-tracking devices, the RB performed comparably to
other devices and displayed high intraclass correlations (>0.93) for overall epoch-by-epoch
sensitivity.[1] In one previous study of 50 adults who wore an automatically scored RB device and
a manually scored device (ActiGraph GT3X + ) for 7-nights, sleep onset, sleep duration
and wake time were compared.[3] The RB performed similarly to the manually scored device when measuring these sleep
metrics, during an unfamiliar laboratory night stay and when worn at home in a familiar
environment. It was concluded that the RB could be used in the same capacity as the
ActiGraph for the collection of sleep metrics.[3]
With the emergence of new sleep-monitoring technologies, it is important to understand
what differences exist between manually and automatically-scored devices to enable
decisions regarding whether the data obtained is comparable. Therefore, the aim of
the current study was to investigate the differences between sleep metrics from a
device using manual scoring of sleep metrics to a commercially available actigraphy
device that uses automatic scoring, by evaluating 60-nights of sleep data from 20-healthy
adult participants wearing the two devices concurrently.
Methods
Participants
A total of 20 healthy adults (10 male, 10 female, age: 26 ± 10 years [mean ± SD]),
participated in the current study. Participation in the study was voluntary and all
participants provided written consent before taking part, with inclusion dependent
on being free from any diagnosed sleep disorders. Ethical approval for the study was
obtained from the University's Human Research Ethics Committee (HREC) (Health) #2018–0.
Sample size was calculated for the current study using an a priori analysis based
on an expected r-value of 0.8, a precision value of ± 0.2 and 95% confidence levels using a web-based
calculator.[20] This calculation resulted in an estimated n = 18 participants.
Experimental Design
Participants were required to wear two different wrist actigraphs, the automatically
scored Readiband™ (AUTO) and the manually scored Micro MotionLogger® (MAN), and have
sleep recorded for a 3-day/night period concurrently, similar to the procedures of
Dennis et al.[18] In the current study, both the MAN and AUTO devices were tightly secured together
with the MAN on top of the AUTO, using electrical tape so that the devices could not
move independently of each other. Devices were initialised before being worn on whichever
wrist felt comfortable,[21] and data commenced recording in 1-minute epochs.[18] Total sleep time (TST), sleep efficiency (SE), time in bed (TIB), sleep onset latency
(SOL), wake after sleep onset (WASO), wake episodes (WE), sleep onset time (SOT),
and wake time (WT) were all assessed ([Table 1]). The devices were removed for any water submersion activities and placed back on
the wrist immediately post activity. Participants were instructed to maintain their
usual sleep habits and general daily activity patterns during the 3-day monitoring
period, before actigraphs were removed and data downloaded.
Table 1
Definitions of each sleep variable measures to be compared and validated between the
Fatigue Science Readiband™ (AUTO) and Micro Motionlogger® (MAN) actigraphy devices.
Sleep indices
|
Units
|
Description
|
Total Sleep Time (TST)
|
Minutes
|
Total time spent asleep
|
Total Time in Bed (TIB)
|
Minutes
|
Total time spent in bed
|
Sleep Efficiency (SE)
|
%
|
Total sleep time divided by total time in bed
|
Sleep Onset Latency (SOL)
|
Minutes
|
Time taken for sleep onset
|
Wake Episodes per Night (WE)
|
Number count
|
Total number of awakenings per night
|
Wake After Sleep Onset (WASO)
|
Minutes
|
Time spent awake after sleep onset per night
|
Sleep Onset Time (SOT)
|
Time of day (p.m.)
|
Time fell asleep at night
|
Wake Time (WT)
|
Time of day (a.m.)
|
Time woken in morning
|
Automatic Scoring Actigraphy
The AUTO actigraph (Readiband™ version-5, Fatigue Science, Honolulu, USA), has been
previously used in sleep research,[8]
[10]
[22] and records data at a sample rate of 16 Hz. The AUTO uses a patented algorithm to
automatically score sleep data derived from raw acceleration signals via specialized
Readiband Sync™ software.[14]
[23] The AUTO device has shown accuracy in distinguishing sleep from wakefulness ∼82%
of the time when epoch scoring against PSG.[24] The RB has also been approved by the US Federal Drug Administration for measurement
of sleep.[23]
Manually Scored Actigraphy
The MAN actigraphy (Micro MotionLogger®, Ardsley, New York, USA) uses a tri-axial
accelerometer which has also been compared with PSG, and distinguished sleep from
wakefulness accuracy ∼80% of the time.[24] Data were collected using the device's zero-crossing mode ad recorded in 1-minute
epochs.[25] Using the manufacturer's software (Action-W version 2.7.3045, Ambulatory Monitoring
Inc., Ardsley, New York, USA), sleep and wakefulness were estimated based on activity
count using the Cole-Kripke algorithm.[26] Manual scoring of the sleep data involved one technician scoring all 60 night's
sleep files individually for ‘start time’ and ‘end time’ of the rest interval, and
for any wake periods throughout the rest interval for each participant.[7] Points were placed on the computer file to mark the intervals the participants were
in bed and the times the device was removed. To then assess the reliability of manual
selection of rest intervals, a randomly selected 33% (20 sleep files) were double
scored by a second independent trained researcher. Any discrepancies of more than
15 minutes for either ‘start time’ or ‘end time’ of the rest interval were flagged
and re-analyzed by both technicians. If agreement could not be reached on any files,
a third independent researcher would have been used for scoring; however, this did
not occur. A total of four files were re-analyzed by both researchers with a final
accuracy rate of 87.9% achieved between the two researchers, this threshold has previously
been described as acceptable.[27]
Statistical Analysis
Simple group descriptive statistics are shown as means ± standard deviations unless
stated otherwise. A paired t-test was used to compare AUTO and MAN metrics using a Statistical Package for Social
Science (V. 22.0, SPSS Inc., Chicago, IL), with statistical significance set at p < 0.05. Inter-device agreements for AUTO and MAN were examined using Pearson correlation
coefficients (r) with 95% confidence intervals (95% CI) and interpreted using thresholds of <0.30:
poor, >0.30: fair, >0.50: moderately strong, >0.80: very strong.[28] The mean differences/bias and upper and lower limits of agreement (1.96 standard
deviations or 95% of a normally distributed population) between devices were determined
in absolute values for TST, SE, TIB, SOL, WE, and WASO. Between-device typical error
of measurement (TEM) was determined using a customized excel spreadsheet.[29] Consistent with previous research, we defined an a priori difference between the two devices of <30 minute for TST, and <5% for SE as satisfactory.[30]
Results
There were no significant differences between devices (AUTO and MAN) for any of the
measured sleep variables (p > 0.05, [Table 2]). There was a mean difference between devices of less than 1-minute over the 60
nights of data for TST and 1.1% for SE ([Table 2]), with very strong correlations between devices for both these measures ([Table 3]).
Table 2
Mean ± SD values for both the automatically scored Readiband™ (AUTO) and the manually
scored Micro Motionlogger® (MAN) actigraphy devices, for all measured sleep variables
and p-values for each comparison.
Sleep Indices
|
AUTO
|
MAN
|
P-Value
|
Total Sleep Time (min)
|
438.6 ± 87.5
|
439.1 ± 90.6
|
0.974
|
Sleep Efficiency (%)
|
91.1 ± 5.1
|
92.2 ± 5.2
|
0.240
|
Total Time in Bed (min)
|
459.1 ± 96
|
465.3 ± 92.6
|
0.717
|
Sleep Onset Latency (min)
|
19.4 ± 14.5
|
16.7 ± 11.9
|
0.145
|
Wake Episodes per Night (Number)
|
8.0 ± 4.9
|
9.7 ± 5.0
|
0.061
|
Wake After Sleep Onset (min)
|
30.0 ± 23.3
|
32.6 ± 20.7
|
0.528
|
Sleep Onset Time (Time of day)
|
21:49 ± 0:48
|
21:59 ± 0:42
|
0.231
|
Wake Time (Time of day)
|
5:49 ± 1:01
|
5:55 ± 1.02
|
0.634
|
Table 3
Typical error of measurement (TEM), mean difference, range of difference and Pearson
correlations for each sleep metric between automatically scored Readiband™ (AUTO)
and manually scored Micro Motionlogger® (MAN) actigraphy devices.
|
TEM (95% CI)
|
Mean difference ( ± SD)
|
Range of mean difference (1.96xSD)
|
Pearson correlation coefficient (95% CI)
|
Total Sleep Time (min)
|
15.5 (12.3 to 17.7)
|
0.53 ± 20.6
|
-39.7 to 40.8
|
0.97 (0.96 to 0.98)
|
Sleep Efficiency (%)
|
2.7 (2.3 to 3.4)
|
1.1 ± 2.9
|
-4.6 to 6.8
|
0.84 (0.74 to 0.90)
|
Total Time in Bed (min)
|
29.6 (25.0 to 36.2)
|
6.2 ± 29.4
|
-51.4 to 63.9
|
0.95 (0.92 to 0.97)
|
Sleep Onset Latency (min)
|
6.4 (5.4 to 7.8)
|
-2.6 ± 6.5
|
-15.4 to 10.1
|
0.90 (0.83 to 0.94)
|
Wake Episodes per Night (No)
|
1.4 (1.2 to 1.7)
|
1.6 ± 1.4
|
-1.2 to 4.5
|
0.96 (0.93 to 0.97)
|
Wake After Sleep Onset (min)
|
12.6 (10.7 to 15.4)
|
2.5 ± 12.6
|
-22.2 to 27.2
|
0.84 (0.75 to 0.90)
|
The variables of SOL, TIB, and WE resulted in very strong correlations between devices and a mean difference of <6.2 minute ([Table 3]). Comparison between devices for these variables also resulted in TEM's of <29.6 minute
([Table 3]). The remaining variables: SE and WASO, also resulted in very strong correlations between devices, with TEM values of 2.7%and 12.6 minute, respectively
([Table 3]). Level of agreement (Bland-Altman) plots showing ± 95% limits of agreement between
AUTO and MAN for key sleep variables of TST, SOL, and SE are displayed in [Fig. 1].
Fig. 1 Level of agreement (Bland-Altman) plots showing ± 95% limits of agreement between
automatically scored Readiband™ (AUTO) and manually-scored Micro Motionlogger® (MAN)
for a) total sleep time (TST); b) sleep latency (SL); c) sleep efficiency (SE).
Discussion
This study examined the differences between a commercially available, automatic-scoring
actigraph when compared with a manually scored actigraph in healthy adult participants
while wearing both devices concurrently. The aim of the study was to simply compare
the metrics coming from the two devices and not to evaluate the overall validity or
accuracy of actigraphy as a method of monitoring sleep. The correlation between these
manually and automatically scored devices was very strong for all sleep variables with no significant differences in any of the measured sleep
variables between devices. The automatically scored device performed comparably to
the manually scored device in the current study, suggesting a practical alternative
to achieve similar levels of accuracy without the time demand or expertise of a trained
technician required to score the actigraphy trace.
Werner et al.[29] stated that a difference between two devices of < 30 minute for TST and a difference < 5%
for SE can be considered satisfactory. Indeed, results from the current study were
under < 30 minute for TST and < 5% for SE, with mean differences between devices of
less than 1-minute over the 60 nights for TST and 1.1% for SE. Accordingly, based
on the suggestions of Werner et al. (2008), the differences between devices for these
identified key sleep metrics in the current study can be deemed acceptable, and the
AUTO can be considered an appropriate alternative for use in both practical and research
settings.
In the current study, TST, TIB, SOL, WASO, WE and SE for AUTO and MAN indicated no
significant difference and all Pearson correlation coefficients were very strong. Dunican et al.[3] compared the automatically-scored RB to a different manually-scored actigraph (ActiGraph
GT3X + ) during a laboratory observation night, and when worn at home for 7-nights
in a healthy adult population. Dunican et al.[3] reported that TST showed no difference between the devices in the laboratory condition
(p = 0.58), but a longer sleep duration (38 ± 61-minute, p < 0.001) and differences for time at lights-out for RB in the at-home condition.
For SE, the RB estimated 5–12% less (p <0.001), with longer SOL (22–36 minute, p < 0.05) than the ActiGraph GT3X+ device in both conditions.[3] It was concluded that the differences between the at-home and laboratory condition
between devices were due to inaccuracies of the RB reporting time at lights out compared
with ActiGraph's requirement to self-report (e.g., using a marker button on the watch),
and highlighted the challenge of accurately defining these metrics due to different
assessment methods.[3] When compared with the current study, our results indicated no significant difference
between devices, suggesting that there were similarities between the proprietary AUTO
algorithms and MAN scoring. Thus, the results align with the conclusion of Dunican
et al.,[3] who stated that the RB automated algorithm may be used in the same capacity as a
manually scored actigraph for the collection of key sleep measures.
Previous research has identified the inter-device reliability of the AUTO device used
in the current study.[5] In the study by Driller et al.,[5] participants wore two RB devices concomitantly for 77 nights of sleep where sleep
data was assessed. The Driller et al. study[5] found no significant differences between devices for any of the measured sleep variables
(p < 0.05). Mean differences of 2.1 and 0.2 minute for TST and SL were associated with
a low TEM between devices (9.5 and 3.8 minute, respectively). Interestingly, the non-significant
differences between devices observed in the Driller et al.[5] study for all sleep metrics, are remarkably similar to those observed in the current
study. Driller et al.[5] also reported very high intraclass correlations between devices indicating the RB to have acceptable inter-device
reliability. In comparison, while the current study had slightly higher TEMs for some
of the measures, this is somewhat expected, as the current study included two different
brands of device, with different algorithms and methods of scoring (manual versus
automatic). It is therefore promising that the differences were similar when comparing
the inter-device reliability (RB versus RB) and the comparison of two different devices
(MAN versus AUTO). RB has been shown to be reliable (inter-device), and RB compares
favorably to MAN, indicating RB can be considered a valuable tool for monitoring sleep
metrics.
The current study is not without limitations, these include comparing AUTO vs MAN
devices and associated algorithms in only an at-home environment and not in a laboratory
condition, where differing results may have occurred between devices.[3] We also acknowledge that not comparing to PSG was a limitation, however, this comparison
has already been made for both devices[3]
[24]
[25] and the main aim of the current study was to compare AUTO and MAN devices.
In conclusion, AUTO may provide an accurate and practical solution for use in healthy
populations to give an indication of sleep/wake patterns. An automatically scored
device such as the RB does not require any expertise, and when compared with the time
required for manual scoring of actigraphy traces, the RB may provide a more time-efficient
alternative, therefore allowing large groups of individuals to be monitored effectively
with comparable accuracy.