Keywords:
Parkinson Disease - Activities of Daily Living - Reproducibility of Results
Palavras-chave:
Doença de Parkinson - Atividades Cotidianas - Reprodutibilidade dos Testes
INTRODUCTION
Parkinson’s disease (PD) is a progressive neurological disorder characterized clinically by resting tremor, bradykinesia, rigidity, and postural instability[1]. These cardinal signs cause limitations in activities, such as walking[2] and manual dexterity[3], and affects the ability to perform activities of daily living (ADL). ADL consist of daily self-care activities, such as bathing/personal hygiene, dressing, self-feeding, and mobility[4]. Limitations in ADL are associated with poor health-related quality of life[5] and life satisfaction[6].
The objectives of rehabilitation interventions are to improve patients’ functionality as well as to help individuals and their families cope with the functional limitations of PD[7]. Therefore, the ADL assessment is relevant to the rehabilitation process and should be based on valid and reliable measures. Generic measures, such as the Barthel Index[8], Lawton and Brody scale[8], and the Functional Independence Measure[8], have been used to assess individuals with PD. However, these generic measures cannot assess the impact of PD-related impairments on performing ADL.
Likewise, specific measures, such as the Unified Parkinson’s Disease Rating Scale (UPDRS) Part II[2],[5],[9],[10], Movement Disorder Society-Sponsored Revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Part II[11],[12], and Parkinson’s Disease Activities of Daily Living Scale[13], are also used. However, these instruments have the following limitations: ambiguity and redundancy of items, such as items overlapping in the ADL and motor sections (UPDRS or MDS-UPDRS Part II); lack of the patient’s perspective on the most important ADL (UPDRS, MDS-UPDRS Part II, Parkinson’s Disease Activities of Daily Living Scale), and no identification of the severity of limitations in specific ADL because it is a single global rating (Parkinson’s Disease Activities of Daily Living Scale)[13]. On the other hand, there are few specific instruments to measure ADL in the Brazilian PD population. Thus, self-reported measurements can help rehabilitation professionals to apply principles of a client-centered approach by facilitating their involvement in goal setting, treatment planning, and the decision-making process[14].
The Activities of Daily Living Questionnaire (ADL Questionnaire) is a new tool developed based on the perspective of individuals with PD. It was originally created in Korea and later published in English[15]. The questionnaire assesses ADL performed at home and in community environments, and can be self-administered or conducted through a short interview[15]. It shows adequate validity, reliability[15], and clinical use, since it has no cost, its instructions are clear, and it can be applied through a brief interview.
The majority of the development and testing of new measures has been conducted in English-speaking countries[16]. Since ADL performance is influenced by cultural factors, clinical professionals in non-English speaking countries must first culturally adapt these tools, using a specific methodology[16],[17],[18]. Cultural adaptation both reduces the cost of research and provides the possibility of comparing data within the national and international scientific communities[16],[19]. After adaptation, it is recommended that culturally adapted measures be reproducible (i.e., test-retest reliability and agreement) to determine the degree to which they provide error-free assessments[17],[19]. Therefore, the aim of this study was to adapt the ADL Questionnaire to the Brazilian culture and to analyze its reproducibility in individuals with PD.
METHODS
This study was approved by The Research Ethics Committee of the Universidade Federal de Minas Gerais, Brazil. Declarations of informed consent were obtained from all patients before participating in the study. The questionnaire was translated after receiving official permission from the original authors. Cultural adaptation followed recommendations from Beaton et al.[17].
Initially, two translations (T1 and T2) were made independently by bilingual translators — one rehabilitation professional and one Brazilian English teacher. This process ensured that one version had more reliable equivalence from a measurement perspective and another version was more aligned with the language used by the general population. The translation stage was summarized by a physical therapist and an occupational therapist, both with experience in the rehabilitation of individuals with PD and the process of cultural adaptation of assessment tools. Summary of the results from T1 and T2 resulted in a unified version (T3). Subsequently, two bilingual translators from North America independently made two back-translations from T3 (combination of V1 and V2). It is important to note that the translators had no access to the original version of the questionnaire. An expert committee consisting of a physical therapist, two occupational therapists, a translator, and a back-translator analyzed the semantic, idiomatic, cultural, and conceptual equivalence between the V1, V2, T3, and the original versions. This committee also consolidated these versions and developed a pre-final version that would be tested on the target population. The pre-final version was first tested on 10 individuals with PD to confirm the syntax and comprehensibility of the questions. Based on whether the patient’s understanding was easy or difficult, a new question was added to each item of the questionnaire. Items that presented comprehensive difficulties for 20% or more participants were revised. Following the pilot tests, a final meeting was organized in which all translators discussed the comments made by the patients. Subsequently, the third and final version of the ADL-Brazil Questionnaire was developed.
The psychometric evaluation of the ADL-Brazil Questionnaire was performed with reference to the Consensus-based Standards for the selection of health status Measurement Instruments (COSMIN) checklist[20]. The sample size for Phase 2 was calculated based on hypothesized intraclass correlation coefficients (mdcs) ≥0.85, with 95% confidence interval (95%CI) of ±0.1. Thirty participants were required for the test-retest reliability study[3]. In this phase, the ADL-Brazil Questionnaire was applied by an examiner who followed the instructions proposed by the authors[15], through an interview, on two different occasions with an interval of seven to ten days[21].
The participants were recruited from two movement disorder outpatient clinics in Belo Horizonte, Brazil, during the period of February to June, 2019. Inclusion criteria were as follows: idiopathic PD according to the United Kingdom Parkinson’s Disease Society Brain Bank clinical diagnostic criteria[22] and stage 1 to 4 in Hoehn-Yahr (HY)[23]. Participants were excluded if they had any other neurological and psychiatric diseases and/or cognitive decline according to the Mini-Mental State Examination[24]. Patients who had other medical problems that seriously affected their ADL, such as orthopedic impairments, were also excluded. None of the patients were subjected to any deep-brain stimulation in this study.
Demographic and clinical data of the patients were collected. The Unified Parkinson’s Disease Rating Scale (UPDRS) Part II was used to assess different aspects of their experiences of daily living[25]. The UPDRS and ADL Questionnaire scores were obtained in the “on” phase of the disease. This phase was defined according to each participant’s drug regimen. Information on the onset and duration of effect of antiparkinsonian drugs were recorded. There was support from the neurologists.
The ADL Questionnaire consisted of 20 items scored on a 6-point scale (0=no problem; 1=slow but with no difficulty; 2=mildly difficult but do not need help or assistance; 3=moderately difficult and sometimes need help or assistance; 4=severely difficult and mostly need help or assistance; 5=incapable of performing the activity)[16]. Items were summed to produce a total score (range from 0 to 100), with higher scores indicating a worse level of limitation[15]. The questionnaire showed a high internal consistency (0.96–0.97) and acceptable test-retest reliability (0.63–0.98)[15].
Descriptive statistics were used to characterize the sample. Reproducibility was analyzed using test-retest reliability and agreement values. The test-retest reliability is “the extent to which scores for patients who have not changed are the same for repeated measurement over time”[26]. The test-retest reliability of the individual items was calculated using quadratic weighted Kappa coefficients (κw)[21] and classified according to Landis and Koch: weak (0.00>k<0.20), fair (0.20>k<0.40), moderate (0.40>k<0.60), good (0.60>k<0.80), and almost perfect (k>0.80)[27]. The test-retest reliability of total scores was determined by calculating the intraclass correlation coefficient (ICC2,1)[21], classified as follows: very high (≥0.90), high (0.70≤ICC≤0.89), moderate (0.50≤ICC≤0.69), low (0.26≤ICC≤0.49), and very low (ICC≤0.25)[28].
Agreement assesses how close the results of the repeated measurements are, by estimating the measurement error in repeated measurements[29]. The limits of agreement were verified using the Bland-Altman plot, which provides a visual representation of systematic bias between the mean values and the two tests occasions[21]. The standard error of measurement (SEM) and the minimum detectable change (MDC) (inter-rater reliability) were calculated using the SEM=SD×√(1-ICC) formula, where SD is the standard deviation found in the first application; MDC=1.96 SEM √(2)[29]. The SEM % and the MDC % were calculated using the corresponding percentage of SEM and MCD respectively, relative to the mean of the test-retest reliability measurement[30]. For the SEM %, values <10% were considered acceptable[31], while for the MDC %, <30% was considered acceptable[32]. All the analyses were performed using SPSS statistical software v21.0 for Windows with a significance level of 5%.
RESULTS
Translation
No divergence was identified between the original and back-translated versions, which demonstrated adequate semantic and conceptual equivalence. With regard to idiomatic equivalence in the analysis of the pre-final version, the committee of experts detected the need to change the use of expressions in all items in which the verbs were used in the gerund form. Thus, to adapt this version to the Portuguese Brazilian linguistic norms, all items were changed to their infinitive forms.
During the evaluation of cultural equivalence, the committee of experts concluded that the item ‘Using a spoon and chopsticks’ was culturally inappropriate in the Brazilian context. Thus, it was translated as ‘Usar uma colher e hashi’, and adapted to Brazilian as ‘Usar uma colher e garfo’ (Using a spoon and fork), which replaced ‘chopsticks’ by a utensil that requires a similar manual dexterity.
Ten patients with PD answered the pre-final version. They had 4 to 11 years of schooling and the mean age was 70±10 years. All participants understood the items of the pre-final version, with a maximum application time of 10 minutes. No difficulties or doubts were identified in understanding the items during the application of the questionnaire ([Table 1]).
Table 1 Final translated version of the ADL-Brazil Questionnaire.
Psychometric analysis
The sample characteristics are shown in [Table 2]. The Bland-Altman plot ([Figure 1]) showed no systematic changes in the mean test-retest scores, but identified one individual who showed atypical behavior. The difference in scores between the first and second applications in one of the patients exceeded the 95%CI limits, and was considered an outlier. The mean and standard deviation of the test and retest total score were 43.4 (±20.96) and 44.5 (±24.65), respectively. The Kappa coefficient classification of the test-retest reliability analysis was >0.80 (almost perfect) for 11 items (2, 5, 7, 10, 12, 13, 15, 17, 18, 19, 20) and between 0.60–0.80 (good) for 8 items (1, 3, 4, 6, 8, 9, 11, 14, 16), indicating good reliability. Mean test and retest, Kappa values, and 95%CI Kappa for each item are shown in [Table 3]. Regarding the total score, a very high reliability value was found (ICC=0.98; 95%CI 0.93–0.98; p<0.0001). The calculation of the changes in the mean test-retest values showed that the difference (đ) was positive and the 95%CI of (đ) included zero, i.e., no systematic changes were observed. The SEM (SEM%) and MDC (MDC%) values were considered adequate ([Table 4]).
Figure 1 Bland-Altman plot of test-retest scores agreement of ADL-Brazil Questionnaire (n=30). The x-axis represents the mean test-retest scores. The y-axis displays the difference between the scores of the first (test) and the second (retest) application of the ADL-Brazil Questionnaire.
Table 2
Clinical and demographic characteristics of the participants.
|
n=30
|
Age (years), mean±SD (min–max)
|
64,5±11.9 (41–86)
|
Sex (men), n (%)
|
19 (63%)
|
Educational level (years), mean±SD (min–max)
|
7.8±4.9 (1–17)
|
Disease duration, yr since diagnosis (years), mean±SD (min–max)
|
9.7±6.1 (2–30)
|
Cognition (MMSE), mean±SD (min-max)
|
25.4±3.0 (19–30)
|
Family context, n (%)
|
Living alone
|
4 (13)
|
Living with family or partner
|
26 (87)
|
Occupation, n (%)
|
Salaried person
|
2 (7)
|
Unemployed
|
4 (13)
|
Retired
|
24 (80)
|
Modified Hoehn & Yahr score, mean±SD (min–max)
|
2.2±0.85 (1–4)
|
|
Stage 1, n (%)
|
5 (17)
|
Stage 1.5, n (%)
|
4 (13)
|
Stage 2, n (%)
|
8 (27)
|
Stage 2.5, n (%)
|
4 (13)
|
Stage 3, n (%)
|
7 (23)
|
Stage 4, n (%)
|
2 (7)
|
UPDRS II, mean±SD (min–max)
|
11.2±8.5 (0–30)
|
PD: Parkinson’s disease; MMSE: Mini Mental State Examination; UPDRS: Unified Parkinson’s Disease Rating Scale; SD: standard deviation.
Table 3
Descriptive statistics and inter-rater reliability results of individual items of the ADL-Brazil Questionnaire (n=30).
Item
|
Mean test
|
Median (min-max) Test
|
Mean retest
|
Median (min-max) retest
|
Kappa values*
|
Kappa (95%CI)
|
Get in/out of bed
|
2.5
|
3 (0–5)
|
2.3
|
2.5 (0–4)
|
0.63
|
0.28–0.97
|
Sitting on/getting up from the floor
|
2.8
|
3 (0–5)
|
2.6
|
3 (0–5)
|
0.87
|
nc
|
Dressing
|
2.3
|
2 (0–5)
|
2.2
|
2.5 (0–4)
|
0.75
|
0.47–1
|
Taking a bath/shower
|
1.7
|
2 (0–5)
|
1.8
|
2 (0–4)
|
0.75
|
0.47–1
|
Writing
|
2.3
|
2 (0–5)
|
2.2
|
2 (0–5)
|
0.84
|
nc
|
Swallowing
|
1.7
|
2 (0–5)
|
1.7
|
1 (0–5)
|
0.78
|
nc
|
Walking
|
2.4
|
2.5 (0–5)
|
2.3
|
2 (0–5)
|
0.88
|
nc
|
Turning
|
2.7
|
3 (0–5)
|
2.6
|
2 (0–5)
|
0.76
|
0.54–0.99
|
Walking up/down stairs
|
2.1
|
2 (0–4)
|
2.3
|
2 (0–5)
|
0.68
|
0.37–0.99
|
Getting in/out of a car
|
2.6
|
2 (0–5)
|
2.5
|
2 (0–5)
|
0.81
|
0.58–1
|
Turning around in bed
|
2.3
|
2 (0–5)
|
2.4
|
2 (0–5)
|
0.72
|
0.48–0.96
|
Sitting on/rising from a chair
|
2.0
|
2 (0–5)
|
2.3
|
2 (0–5)
|
0.87
|
nc
|
Standing
|
1.9
|
2 (0–4)
|
2.1
|
2 (0–5)
|
0.85
|
nc
|
Using the toilet
|
2.0
|
2 (0–5)
|
2.0
|
2 (0–5)
|
0.78
|
nc
|
Using a spoon and chopsticks
|
1.8
|
2 (0–4)
|
2.1
|
2 (0–5)
|
0.85
|
nc
|
Talking
|
1.5
|
1.5 (0–4)
|
1.8
|
2 (0–5)
|
0.73
|
0.46–0.99
|
Taking the first step
|
2.0
|
2 (0–4)
|
2.2
|
2 (0–5)
|
0.81
|
nc
|
Moving an object
|
2.2
|
2 (0–5)
|
2.1
|
2 (0–5)
|
0.86
|
nc
|
Crossing the street
|
2.3
|
2 (0–5)
|
2.6
|
3 (0–5)
|
0.86
|
nc
|
Getting on/off a bus or subway
|
2.4
|
2.5 (0–5)
|
2.5
|
3 (0–5)
|
0.86
|
0.74–0.98
|
*Quadratic weighted Kappa values. nc: not calculated (it was used when the data included a substantial proportion of zero).
Table 4
Reproducibility measures of the ADL-Brazil Questionnaire total scores (n=30).
|
n=30
|
ADL-Brazil Questionnaire scores (test), mean±SD (range)
|
43±21 (0–87)
|
ADL-Brazil Questionnaire (retest), mean±SD (range)
|
45±25 (0–90)
|
ICC (95%CI)
|
0.98 (0.93–0.99)
|
d (95%CI)
|
-1.1 (-4.2 to 1.98)
|
SEM (SEM%)
|
3.0 (6.75)
|
MDC (MDC%)
|
8.2 (18.7)
|
ICC: intraclass correlation coefficient; 95%CI: 95% confidence interval; d: difference; SEM: standard error of the measurement; MDC: minimum detectable change.
DISCUSSION
The ADL-Brazil Questionnaire had adequate test-retest reliability and agreement values within acceptable limits. These results indicate the potential for its application in clinical practice and research to evaluate ADL in individuals’ perception of their performance.
ADL are activities performed routinely and necessary for the care of one’s own body[4]. The UPDRS and MDS-UPDRS Part II are extensively applied for the evaluation of disabilities, but these scales mix items directly related to daily activities with patient perceptions of primary disease manifestations, such as speech, salivation, swallowing, tremor, and freezing[25],[33]. In this sense, of the 13 items of both scales, only six in UPDRS Part II and eight in MDS-UPDRS Part II assess the performance of ADL[25],[33]. In this context, the ADL Questionnaire is relevant because it was developed from the patient’s perspective, according to their own limitations to perform daily activities. In the development process of the ADL Questionnaire, movement disorders and neurorehabilitation specialists developed a preliminary 45-item questionnaire that included household, outdoor, and social activities according to previous ADL scales. Afterwards, individuals with PD at different stages of evolution were asked to select the three items most important to them using this preliminary version. Next, clinical and statistical analyses were performed to determine the final ADL Questionnaire version, containing 20 items[15]. Among the selected items were daily activities that had not been contemplated in other questionnaires and that are commonly used to assess this population, such as walking up/down stairs, taking the first step, and crossing the street. Additionally, the items included in the final version include ADL that may be impaired in the different stages of PD evolution, which makes it possible to assess individuals with different levels of functionality[15].
The ADL Questionnaire was developed in Korea and made available in English, hence its required cultural adaptation for application in Brazil. Cultural adaptation for the country in which the questionnaire will be applied is essential, since there may be differences in definitions, beliefs, or behaviors between different cultures[17]. This process has advantages, since it allows for the application of an instrument in different cultures and in multicenter studies, and the financial resources allocated to this process were less when compared to the development of new instruments[16],[17]. However, the methodology proposed by Beaton et al. guaranteed only the face and content validities of the adapted version, which is the first step in the validation process. Thus, it is necessary to conduct further investigation of the process of cultural adaptation and other measurement properties of the adapted version, such as reproducibility, i.e., test-retest reliability and agreement. This was done to analyze whether or not the instrument was suitable for use in clinical and research contexts[34],[35],[36].
The Bland-Altman plot revealed that the individuals showed an average homogeneous distribution in scores, demonstrating that the participants rated their ability to perform ADL similarly on both occasions, that is, there was an absence of true systematic biases between the mean values from the two test occasions. However, one participant had a particularly high difference in mean scores between the first and second test occasions and was considered an outlier[36],[37], although this fact has not affected the stability of the measure. Seven days between test-retest applications was considered a short interval for relevant changes in the ability to perform ADL, but other personal factors, such as mood changes and fluctuations in the clinical condition may have interfered in stability[29],[36]. For this specific patient, other aspects, such as the brief description of the questionnaire items and the low level of education, may have also caused difficulties in interpreting the items[29],[36].
In this study, the weighted Kappa coefficient exceeded 0.60 for all individual items, indicating good reliability. Additionally, the ICC values showed very high reliability of the total score, indicating consistency of the questionnaire when assessing self-perceived ADL performance. Similar ICC values (0.63 - 0.98) were reported in a previous study[15]. The adequate reliability values reported in the present study reflected the stability of the measure. This may be due to the selection, writing, and clarity of the items, which encompassed the ADL, described in a simple and objective structure[35]. This was achieved with the aid of a visual structure for quantifying the answer, which could in fact facilitate the understanding of individuals. This was particularly relevant in the Brazilian scenario in which 30% of the population would understand and express themselves through letters and numbers in daily activities, which can compromise the formulation of self-perception concepts[38].
Although the ICC is a common method of assessing reliability, the evaluation of reliability based only on ICC values may lead to misleading conclusions. This was due to the ICC calculation, which takes into account only the between-subject variability and may not be the most appropriate method to assess the methodological quality of instruments selected to measure changes over time[21],[29]. In fact, to evaluate individual variation between two tests, which was examined by using SEM and MDC, it was necessary to differentiate between real changes and random measurement error[29].
The SEM (SEM%) values were within the limits considered acceptable (<10%)[31]. From a clinical perspective, the SEM quantifies the range over which the true score variation is expected due to the measurement error and should therefore be considered in clinical-decision making[39]. Thus, our results showed the variation of approximately three points between the first and second application, which was related to the measurement error and not to a real change in ADL performance. For example, for an individual with a score of 70, a score ranging from 67 to 73 can be expected in a subsequent application, which would actually be a measurement error rather than a real change in ADL performance[37].
MDC (MDC%) values were also within acceptable range (<30%). From a clinical perspective, the MDC reflected the minimum magnitude of change that is real and not due to random variation or measurement error. Thus, the results of the present study suggest that changes of ≥8 points in the ADL Questionnaire are needed to reflect real changes beyond measurement errors when repeated measures are used. This threshold reference can help clinicians and researchers to reasonably and confidently determine the real changes between repeated measurements for the questionnaire. Our results cannot be compared with previous studies, since this was the first study to use ICC for test-retest reliability analysis and determine SEM and MDC of the ADL Questionnaire.
The results of this study can be generalized to individuals with similar characteristics to the sample. Researchers here decided to use one rater and test-retest to assess reliability. Although this is useful for attempting consistency within the study, it does not strengthen the generalizability of the research outcome. The ADL-Brazil Questionnaire showed adequate test-retest and agreement values, and can be used in research and by clinicians. In addition, this questionnaire could be applied quickly (around 10 minutes), and scores could be easily interpreted.
The strength of the present study is that it included individuals with a wide range of impairment (HY 1–4) and level of ability to perform ADL (ADL Questionnaire scores: 0–90). However, this study does have some limitations. Although the participants were not instructed about previous responses, recall bias could not be ruled out. Additionally, individuals were not randomly selected and may not completely represent the entire population of individuals with PD. Since recruitment was conducted considering voluntary participation, individuals may differ from those in the community. A sample calculation was performed and indicated the minimum number of 30 individuals. According to COSMIN recommendations, this sample size is moderate, but it meets the minimum size indicated in the sample calculation, and satisfactory results were found in previous studies with a similar sample[36]. Future studies should be conducted to determine other clinimetric properties of the ADL-Brazil Questionnaire, such as concurrent validity, discriminant validity, and responsiveness to change in community-dwelling individuals with PD.