Basics of Statistics for Clinical Research in Hand Surgery

Roberto S. Rosales; Isam Atroshi

doi:10.1055/s-0038-1675587

Revista Iberoamericana de Cirugía de la Mano, Inhaltsverzeichnis

CC BY-NC-ND 4.0 · Revista Iberoamericana de Cirugía de la Mano 2018; 46(02): 150-171
DOI: 10.1055/s-0038-1675587

Update Article | Artículo de Actualización

Thieme Revinter Publicações Ltda Rio de Janeiro, Brazil

Basics of Statistics for Clinical Research in Hand Surgery

Estadísticas básicas para la investigación clínica en cirugía de mano

Roberto S. Rosales

¹Hand Surgery and Microsurgery Unit, GECOT, La Laguna, Tenerife, Spain

,

Isam Atroshi

²Department of Clinical Sciences, Lund University, Lund, Sweden

³Department of Orthopedics Hässleholm-Kristianstad, Hässleholm Hospital, Hässleholm, Sweden

› Institutsangaben

Abstract

Volltext

als PDF herunterladen

Keywords

clinical research - statistic - carpal tunnel syndrome - methodology - hand

Palabras clave

investigación clínica - estadística - síndrome del túnel carpiano - metodología - mano

Introduction

Hand surgery has experienced important technical advances after the first description of small-artery repair in the digits by Harold Kleinert in 1963.[1] Developments include digital replantation, toe-to-hand transfers, free flaps for hand reconstruction, hand transplantation, and even improvements in nerve microsurgery. However, the quality of our clinical research is still fair.[2] The quality of the reporting in randomized controlled trials (RCTs) is still poor, with a median score of 2 points in the modified Jadad scale, which classifies the quality of RCTs from 0 (worst quality) to 5 (best quality).[3] Despite the fact that the practice of wrist and shoulder arthroscopy started almost at the same time, the efficacy of arthroscopically performed wrist interventions has only been studied in four RCTs; the median modified Jadad score for the wrist RCTs was 0.5 (range 0–1), compared with 3.0 for the 50 RCTs of significantly higher quality assessing interventions performed through shoulder arthroscopy.[4] More than 80% of the papers published in The Journal of Hand Surgery European Volume (JHSE) and 68% of those published in The Journal of Hand Surgery American Volume (JHSA) presented level-IV evidence (that is, mostly case series studies). The percentages of published papers with the highest level of evidence, including high-quality RCTs and systematic reviews (SRs), in the JHSE was 0.9% (level I) and 5.0% (level II), and in the JHSA, they were 8.3% (level I) and 10% (level II), which is very low.[5]

Hand surgery, as a scientific discipline, should require an appropriate and systematic analysis of all variables to demonstrate the significance of the observations that the surgeons face every day in their clinical practice. There are two approaches to this issue: the first one is to use the experience of the hand surgeon to answer questions that arise in daily practice. The second one is to complement the knowledge acquired after years of practice with a scientific evaluation of the observations reached by clinical research based on statistics, the science of numerical evaluation that can thoroughly help determine the real value of a hand surgical intervention. We cannot conduct good clinical researches in hand surgery without considering the application of the most appropriate and accepted statistical procedures.

The purpose of the present paper was to approach the basics of data analysis using a database of carpal tunnel syndrome (CTS) to understand the data matrix, the generation of variables, the descriptive statistics, the most appropriate statistical tests based on how we collect the data, the parameters estimation (inference statistics), the appropriate use of the p-value and confidence interval (CI), and, finally, the important concept of generalized linear models (GLMs) or regression analysis.

Before Starting

The main part of clinical research starts with the project before the statistical analysis. Based on our observations in clinical practice followed by a systematic review of the scientific literature, we can establish a research question. After that, we have to select the study population or sample for answering our question, the study design (case series, cross-sectional study, case-control study, cohort study, experimental study), and the appropriate instruments and measurements.[2] Finally, we can start the data analysis.

Example and Data Matrix

1–Study Description

In this part, we present a study using part of a database of CTS originating from an experimental clinical design or RCT designed to analyze the effect of reconstructing the transverse carpal ligament on the outcomes of open carpal tunnel (CT) release. The researcher should preestablish the variables to be analyzed based on the clinical design. The teaching purposes in the present study were to demonstrate the following hypotheses:

Hypothesis 1: Smokers have worse symptoms related to CTS than non-smokers.

Hypothesis 2: Reconstruction of the transverse carpal ligament improves the outcomes of open CT release.

2–Generating the original matrix data

In order to generate the matrix data, it is important to use a well-designed form to collect the data of the variables, such as Access software (Microsoft Corp., Redmond, WA, USA), or to use a second blinded check of the data entry, in case the researcher has used directly the Excel (Microsoft Corp., Redmond, WA, USA) worksheet for collecting the data. Otherwise, problems with the missing values will be encountered, which will also be approached in the present paper.

In [Fig. 1], we can observe the initial matrix data in an Excel worksheet with the following variables: “id”, “sex”, “smoking” (cigarettes/day [c/d]), “sspre” (severity of symptoms related to CTS before surgery based on the CTS-6 scale), “sspost” (severity of symptoms related to CTS 3 months after surgery), and “reconstruction”. The CTS-6 scale is the shorter version of the CTS questionnaire developed by Levine et al in 1993.[6] The CTS-6 scale was developed by Atroshi et al[7] [8] and measures the severity of symptoms related to CTS. In the present paper, the Spanish version of the CTS was used,[9] which was developed from the extended Spanish version of the CTS questionnaire,[10] with a good level of reliability and validity for measuring outcomes in patients with CTS.[11]

3–Preparing the data for the analysis: generating and labeling variables

Once we have the main variables and the matrix data, the Excel worksheet can be imported by most statistics computer software. In the present paper, we have used Stata version 14.5 (StataCorp, College Station, TX, USA), and all the charts and graphics have been created using the same software. The statistical techniques for analyzing a matrix data will vary based on the measurement scale (quantitative vs categorical) of the variables. For teaching purposes, in the present paper, we have generated different categorical variables from the variables “smoking,” “sspre” and “sspost” to demonstrate the use of different statistical tests. In that way, we have generated the following categorical variables: “HabitSmoke” (Smoking habit), “SmokLevel” (cigarettes/day), and “NivSSpre” (severe level of symptoms before surgery), based on the potential need for surgery, which was defined by Atroshi et al[12] as a CTS symptom severity score of 3.2. We have also generated a new quantitative variable, “sschange”, which measures the change in the severity of symptoms between before and after the surgery, based on the variables “sspre” and “sspost”[13]. The software enables us to recode the original string variable “reconstruction” ([Fig. 1]) to a numerical binary variable: reconst (reconstruction; “0” = No, “1” = Yes). Observe that the statistical software was able to detect the missing values shown as “.”, which will be important for the analysis ([Chart 1]).

Fig. 1 Original database in Excel worksheet.

Chart 1
Matrix data from Stata after preparing the data for the analysis

Observe the new variables generated as the categorical variables: “HabitSmoke” (Smoking habit “0” = NO, “1”= Yes), “SmokLev” (Level of smoking cigarettes/day; “0”= No, “1”= 1–15, “2”= >15), “NivSSpre” (severe level of symptoms before surgery, “0” = No, “1” = Yes), “NivSSpos” (severe level of symptoms after surgery, “0” = No, “1” = Yes) . A new quantitative variable “sschange”, which measures the change in the severity of the symptoms between before and after the surgery, was generated from the variables “sspre” and “sspost.” Observe that the initial nominal variable “RECONSTRUCTION” in the original matrix data from excel ([Fig. 1]) has been recoded to a numerical categorical variable: Reconst (Reconstruction; “0” = No, “1” = Yes). Observe that the statistical software was able to detect the missing values represented by a “.”, which will be important for the analysis.

An important issue is converting a quantitative variable to a categorical variable. It is generally not recommended to convert a quantitative variable, such as “smoking” and “sspre”, into a categorical variable because it results in loss of information in the analysis and in the conclusions. This is done here for teaching purposes.

The management of the variable “age” should be appropriate. Usually, we ask the patients about their age and they give us the truncated age (45, 56, 34 years old, etc.). A more correct way is to record the exact time (age, follow-up, etc.). For this purpose, we record the date of birth and the date of entry into the study, and we can calculate “Age” (exact time in years) = (birth date – entry date)/365.25, obtaining the exact “age” variable with decimals. If we use the truncated age, we need to do a correction by adding 0.5 to the truncated age. This correction of 0.5 does not change the dispersion of the data or the standard deviation (SD), but only the mean ([Chart 2]).

Chart 2
Handling the age variable

Observe the mean and standard deviation (SD). The correction of + 0.5 point in the variable Age only changes the mean but not the SD. The dispersion of the data in the variable AgeC is the same as that of the variable Age.

4–Handling missing values

The amount of missing values is an index of the quality of the study and of the data collection process in the research. A common mistake in the analysis is to exclude all study participants with missing values (observed not only among young researchers in hand surgery: “If we have enough patients in our study, just go ahead and delete all those with missing values”). Exclusion of the individuals with missing values could be a big problem. If the individuals excluded because of missing values constitute a random sub-sample of the total study population, the only problem will be decreased power and precision of the analysis. However, when the missing values are related to the values of some of the variables of the study, then we have a more important methodological problem (bias in the parameter estimation process in a multivariate model, for example). Based on the origin of missing values, Stata divides the missing values into: system missing (data without value, values incoherent with the variable format, and impossible values in a data transformation, such as 5/0); and user missing or extended missing values; for example, if a questionnaire item asks about smoking (Yes or No) and the next item asks about the number of cigarettes per day, it is expected to have an empty value or a value of 99 in non-smoking respondents, and it should be assigned as user missing. Based on the degree of possible bias, missing system values can be classified into: missing completely at random (MCAR), such as forgetting to record some data; missing at random (MAR), when the missing value is related to an independent confounder variable (X). For example, when studying whether losing weight could improve the symptoms of CTS, a high number of missing values was observed among younger individuals because younger subjects have less tendency to cooperate in the registration of their habits; and nonrandom missing (NRM), when it is related to the dependent variable (Y). For example, when analyzing the relationship between high blood pressure (HBP) and age group, it is expected that the individuals with HBP and older people present a better cooperation and consequently a lower number of missing values for these variables.

If we look at our data ([Chart 1]), we have some missing values. The first step before starting our study is to analyze the missing values; probably, they are MCAR. The missing value analysis estimates and describes those values ([Chart 3]), and it can give information about: total number and percentage of missing values, number of patients with missing values, number of missing values per patient, and the patterns of missing values. At the same time, most of the software generate a new variable called _Nmiss, or similar to it, which can be used to drop the individuals with missing values or to do other analyses to get more information about the missing values.

Chart 3
Missing values analysis

Observe that the total number of missing values was 11. Two missing values in variables: AgeC, Smoking, HabitSmoke and NivTab. Three missing values were observed in Gender. Only 28 patients presented (90.32%) missing values = 0. Two patients (6.45%) presented five missing values, and only one (3.23%) presented one missing value.

Most of the statistics software have two different approaches for data analysis with missing values: pairwise (using the maximum number of subjects available for each variable) and listwise (using only subjects with valid values) selection. For example, in multivariate linear regression analysis, Stata always uses listwise selection in the analysis of the model[14] ([Charts 4A] and [4B]).

Chart 4
Descriptive statistics using pairwise strategy (A) and listwise strategy (B)

Observe that the means, standard deviation (SD), and number of observations change based on the strategy of approach of the missing values. In Listwise selection, the software only uses the patients with valid values, which means that Nmisss = 0.

There are no rules about the accepted level of missing values. If we have more than 10% of missing values, it is recommended to perform descriptive statistics on the variables of the subjects with complete valid values and those with some missing values. Most of the statistics software have some statistical methods to reduce missing values, such as mean imputation and multiple imputations (used in multivariate analysis).[15] The mean imputations process ([Chart 5]) assigns the mean of the variable to the missing value. Consequently, we will obtain a decreased SD, but no changes in the mean of the variable by the imputation.[14]

Chart 5
Descriptive statistics before and after mean imputation for missing values

Observe that after mean imputation (AgeMean), the mean remained with the same value; however, the standard deviation (SD) decreased.

Descriptive Statistics

After generating, recoding, labeling and handling missing values in the variables, we have to check the data. We have created a dataset, and now we need to check the work we have done defining the dataset. Checking the accuracy of our data entry is also our first statistical look at the data. Data checking is very important to detect possible errors in data entry.

The description of the data constitutes the starting point of our analysis and sometimes a proper objective of a study. Descriptive statistics are based on the calculation of several indexes and graphics, which give us information about the distribution of the data (central tendency, dispersion, symmetry, etc.) for every variable.

1. Description of quantitative variables.

We have two types of statistical indexes for describing quantitative variables: measures based on momentum (mean, SD, variance, skewness and kurtosis) and indexes based on ordination (median, quartiles, minimum, and maximum).

The mean represents the gravity center of the distribution in a specific variable and describes the central tendency if we do not have asymmetry. The SD (S) is the square root of the variance (S²), which is obtained dividing the sum of squares by the degree of freedom (df) (S² = SS/df). The SD (S) is interpreted only if the distribution is normal and if it presents symmetry. In this case, the interval mean ± 1 SD contains 68% of the observations in that variable, the interval mean ± 2SD includes 95% of the observations, and the interval mean ± 3SD includes 99.7% of the cases.

Other measures based on moments are skewness and kurtosis. Measures based on ordination are the median and quartiles. Those measures are not affected by the asymmetry of the distribution. The median and the mean are equal when the distribution is symmetric. The symmetry is measured by the skewness index, which will be positive when the mean is higher than the median, and negative when the mean is lower than the median. The degree of flattening of the distribution is measured by the kurtosis index (in Stata, kurtosis > 3 implies a distribution with more sharpness than a normal distribution). In [Fig. 2], we can observe the P-P normal graph, the box plot and the stem and leaf diagram which give an image of the distribution of the data in the sspre variable. The stem and leaf diagram allows us to calculate in an easy way the median and the Q₁ = percentile 25 and Q₃ = percentile 75 quartiles. The Q₂ = percentile 50 is equal to the median. Besides, we can observe in the box plot that is very useful to see the distribution of the data created by the median and the Q1 and Q3 quartiles, forming the box and the minimum and maximum, which form the whiskers. Box plots can be used to see the symmetry, outliers, and to demonstrate change in a variable such as symptoms related to CTS if we compare the graph plot of the sspre with that of the sspost and see the improvement in symptoms after surgery ([Fig. 3]). The descriptive statistics alerted us that there were errors during data entry because we could find values > 5 in [Fig. 2] (max = 5.4), and, in [Fig. 3], the upper whiskers of the box plots of sspre and sspost were > 5, which is the maximum score that any item of the CTS-6 questionnaire can achieve. Consequently, the researcher should check the original data entry, even the original questionnaires, to overcome those mistakes. Stata can give us the Id number with sspre and sspost with values > 5 (list Id if sspre >5 & sspre < . | sspost >5 & sspost < .), which were Id: 2, 3, 19, and 30. If the researcher could not access the original matrix or questionnaires or if it is impossible to know the exact values of those variables in those individuals, the researcher should assign a missing value to those entries. The matrix data of the present paper was created for teaching purposes, but the rest of the analyses were done using the correct values in the database

Fig. 2 Descriptive statistics of the quantitative variable “sspre.” From left to right: Graphic P-P for normal distribution (observe that the data are almost coincidental with the line); box plot (the box delimitated by the p25 and p75 quartiles, and the median, the whiskers are the minimum and maximum); and the stem & leaf diagram. Descriptive statistics showed data with values higher than the maximum, and the normality tests demonstrated that the data for the sspre variable followed a normal distribution (p > 0.05). Abbreviations: sspre, carpal tunnel syndrome symptoms at the baseline before surgery.

Fig. 3 Box plot of sspre and sspost variables. Observe the change in the severity of the CTS symptoms between before and after the surgery. Notice that the box plot of sspost presents a positive asymmetry. Abbreviations: CTS, carpal tunnel syndrome; sspost, severity of symptoms related to CTS 3 months after surgery; sspre, severity of symptoms related to CTS before surgery based on the CTS-6 scale).

2. Descriptions of categorical variables

Categorical variables (such as “gender”, “HabitSmoke”, “SmokeLevel”, etc.) are described by creating a table of frequencies that classifies the individuals based on the category and calculates percentages ([Fig. 4]).

Fig. 4 Descriptive statistics of categorical variables. Cross tabulation of severe level of CTS symptoms (NivSSpre) and smoking level (SmokLevel) and bar graph. Abbreviations: CTS, carpal tunnel syndrome.

The importance of the descriptive statistics is checking our data. The descriptive statistics constitute the main possibility for discovering errors in our data. The main checking list during the descriptive statistics will be:

Id. Check that all variables present an Id number and that there is no duplication of the number.
Categorical variables. Check that the values belong to the set of valid values (e.g., observe values different from 0 or 1 in the “gender” variable)
Quantitative variables. Check that the values are included in the interval that defines the valid values (for example, the sspre variable ranges from 1 to 5, and if the summarized information of the variable detects values over 5, it indicates an error in the collection or in the entry of the data).
Dates. Check that the dates are correct if the researcher has used them for generating variables as “age” or “follow-up” time.
Consistency between variables. Check that the values of a variable are consistent with others (for example, if we observe negative values in “age”, it would indicate that the date of birth is posterior to the date of entry in the study).

If we detect errors during the checking process of the descriptive statistics:

Replace the erroneous data with correct data if the researcher has the original records and information or access to the subjects of the study.
Replace the erroneous data with missing values if the researcher does not have the correct information.

Inference Statistics

Decision-making in clinical research implies testing whether our hypotheses are true or false based on empiric results obtained in samples of individuals. The results in a specific research or study are based on the measurements made in one of the infinite samples that can be obtained from the reference population. Consequently, the value of our measurements can vary due to chance. Inference statistics assume that the random variability of sampling follows known laws and allows the researchers to quantify that variability and to facilitate decision-making about the hypotheses established and to draw conclusions.[14]

1. P-value and significance test of the null hypothesis (H₀).

Fisher[16] [17] established in 1922 the significance test for null hypothesis. For example, our clinical observations allow us to suppose that the reconstruction of the transverse carpal ligament during open CT release could not improve the severity of the symptoms related to CTS. To demonstrate that hypothesis, a sample of 31 individuals was randomized to receive only open CT release (sample A: no reconstruction, n_A = 16) versus open CT release plus ligament reconstruction (sample B: reconstruction, n_B = 15). The severity of symptoms score ranged from 1 to 5. Group A presented a mean change in the severity of symptoms score between before the surgery and after the surgery (sschange variable) of _A = 1.48 (improvement in symptoms); and a SD of S_A = 1.33. Group B presented a mean of _B = 1.13; and a SD of S_B= 1.004. This hypothesis entails that there is a population A: “change in the severity of symptoms after open CT release”, and a second population, B: “change in the severity of symptoms in open CT release plus ligament reconstruction.” The researcher wants to know if the mean in population A (µ_A) is higher than the mean in population B (µ_B). However, the researcher does not know these values. To know whether or not the reconstruction improved the “sschange,” the researcher needs a reference distribution to assess if the observed difference in “sschange” between groups A and B (d₀ = 0.35) ([Fig. 5]) is true or caused by the random fluctuation of the sample. Still, the researcher does not know the magnitude of the difference (δ) between the means of “sschange” in the populations; consequently, this hypothesis cannot be demonstrated. A second option is that the researcher can formulate the null hypothesis that the reconstruction does not improve “sschange” (H₀: δ = µ_A - µ_B = 0). Now, it is possible to establish a reference distribution around the H₀, because it is a very specific hypothesis (δ = 0). The reference distribution will be the sample distribution of the observed difference with a mean equal to 0 (µ = 0) and with a SD that will be the standard error (SE_d) of the difference, which is equal to the sum of the SE of the samples A and B .[18] The SDs of the real populations are not known, but if we assumed that difference of the samples presented a normal distribution, we could exchange them by the SD of the samples. From here, we can calculate the t₀ statistic (dividing d₀ by the SEd). Finally, we want to know the probability of finding a value of T similar or higher to t₀, taking into account the degree of freedom (df), which is equal to a total number of individuals (n = 31) minus 2 (2 groups) (df = 29). The obtained probability is higher than 0.05, which means that the difference between groups is compatible with the null hypothesis H₀ [18]. Consequently, reconstruction of the transverse carpal ligament does not improve the change in the severity of the symptoms after CT release. In this example, given the teaching purpose of this paper, we have not taken into account the sample size, the power of analysis and the type II error, which will be explained below.

Fig. 5 Comparison of two means. Inference statistic. Note: (µA ; σA) = mean and SD in population A; (µB ; σB) = mean and SD in population B;

_A; S_A = mean and SD in sample A;

_B; S_B = mean and SD in sample B; SE = standard error. (based on Domenech[18] [26]). Abbreviations: SD, standard deviation.

2. Null hypothesis (H₀₎ versus alternative hypothesis (H_a). Alpha (α) and Beta (β) Risks

In 1928, Neyman et al,[19] [20] [21] based on the Fisher significance test, developed the hypothesis testing under a different point of view, apparently very similar to the one proposed by Fisher but totally different in that Neyman et al. They started with the probability theory to establish a rule for decision-making between two hypotheses that were complementary (H₀ = null hypothesis and H_a = alternative hypothesis). That theory constituted a frontal opposition to the inference proposed by Fisher. Neyman et al proposed the α and β risks and the type I and type II errors.

Type I error happens when H₀ is true and it has been rejected because the risk used was very high. Normally, in clinical research, we use a risk of type I error of 0.05 (α =0.05). The interpretation of α =0.05 is very important. Alpha risk represents the conditional probability of being wrong when the null hypothesis is true. When a researcher rejects the H₀ with a risk α of 0.05, it does not mean that the researcher is wrong 5 out of 100 times, because that would only be true if the null hypothesis were always true.[22]

Type II error occurs when the H_a is true, and the researcher accepted the H₀ because the test was not significant due to a small sample size. The probability of type II error is called Beta risk (β). The complementary probability 1-β represents the probability of accepting the H_a when H_a is true, and it is called power of the test. Usually, in health sciences or in clinical research, we use a power of 80%.

3. Confidence interval versus p-value.

In 1934, Neyman,[23] based on the hypotheses test, proposed a new method to answer questions such as the one established in the CTS and ligament reconstruction study explained before ([Fig. 5]), based on Domenech.[18] A good alternative to the hypothesis testing is the CI around a mean. If the samples are representative of the reference populations, we can calculate an interval that could be considered the set of all hypotheses. In that way, one hypothesis located out of the interval is not credible and then, it can be rejected. It is not necessary to know the p-value to conclude whether the difference, observed in [Chart 6] and [Fig. 5], is significant. If the H₀ is included in the 95% confidence interval (95% CI), we can conclude that the difference is not significant (mean difference = 0.3545833; 95% CI = -0.5167852 to 1.225952). The interpretation of the 95% CI is: with a confidence of 95%, we can locate the mean difference of “sschange” between -0.52 and 1.23 because the H₀, which is =0, is included in the interval in which the difference is not significant ([Fig. 6]).

Fig. 6 Change in CTS symptoms in ligament reconstruction vs open CT release. Interpretation of the results using 95% CI (diff Mean = 0.3545833; 95% CI = - 0.5167852 to 1.225952). The H₀ is included in the 95% CI (results not significant). The criterion is included in the CI (inconclusive CI). Criterion of 1 point in the CTS-6 questionnaire, which measured the severity of the symptoms related to CTS. (based on Atroshi et al[8] and Ozyürekoğlu et al[24]). Abbreviations: CI, confidence interval; CT, carpal tunnel; CTS, carpal tunnel syndrome.

Chart 6
T-Student-Fisher test for comparing two means from independent samples. Change of symptoms severity (ChangeSS) in reconstruction vs no reconstruction of the transverse carpal ligament in open carpal tunnel release

(A) Observe that the results of the one-tailed comparison of the two means is similar to the one observed in [Fig. 5]. Conclusion: the results of the one-tailed and two-tailed tests were not significant, assuming equal variances and normal distribution of the differences in sschange. (B) Observe the same results (diff of means = 0.3545833 = β coefficient [No reconstruction]) of comparing two means using regression analysis.

This concept, introduced by Newman, was forgotten for a long time. Today, however, it is very common to use 95% CI instead of presenting many p-values in result charts. The 95% CI has the advantage of giving information about the magnitude of the effect or of an important clinical change. If we already know the minimal important clinical difference (MICD) expected in our outcome variables (such as sschange) in a specific population, we cannot only determine whether the difference observed in the 95% CI is significant, but also assess if that difference (effect) is clinically important. If we look at [Fig. 7], we can see the relationship between the significance and the effect assessed by different 95% CIs. The criterion for MICD on change in the severity of symptoms after CT release, which is ∼ 0.9 to 1, has been measured using the CTS-6 PRO instruments,.[8] [24] The first CI (A) is located above the MICD, which means that the 95% CI is significant (it does not include the H₀) and presents an important effect. Confidence interval “B” includes the MICD in the interval; however, it does not include the H₀; consequently, this 95% CI is inconclusive about the effect; however, it is statistically significant. Confidence interval “C” includes the criterion and the H₀, and it constitutes an inconclusive and not significant 95% CI. Confidence interval “D” is located below the criterion, but it does not include the H₀; consequently, it is a significant interval, but with a non-important clinical effect. Finally, CI “E” is below the criterion and it does include the H₀; therefore, this 95% CI is not significant and the effect is not important.[18] In [Fig. 6], the readers can understand, based on the 95% CI of the “sschange” difference between open CT release and open CT release plus ligament reconstruction, that the difference was not significant because it includes the H₀, and the CI is inconclusive considering to the magnitude of the effect because it includes the criterion.

Fig. 7 Comparison of the results of different confidence intervals. Interpretation of the results based on the 95% CI and a criterion (δ*) or minimal important clinical effect. (based on Domenech[18]). Abbreviations: CI, confidence interval.

Another aspect of the 95% CI is the precision of the interval. A narrower 95% CI implies a better precision because the SE is lower. If we have to select between two 95% CIs that are significant (that is, which do not include the H₀) and with an important effect (both CIs above the criterion or of the MICD), we will choose the narrower 95% CI.

Statistics Test for Data Analysis

Based on the hypotheses established in the Study Description section of the present paper, we will have different statistical tests for answering those research questions.

Before starting the proper data analysis, we need to explore the sample by the fit-goodness test. An important issue, especially in a sample size < 30, is to assess if the distribution of the data follows a normal distribution in the population. We have different statistical tests and graphics for that purpose. For testing the sample for normality, the most commonly used test is the Shapiro-Wilk test, with which we can establish the H₀ = no difference between the sample and a normal distribution. Observe in [Fig. 2] that all tests of normality (Shapiro-Wilk, skewness and kurtosis tests for normality) were not significant (p > 0.05); therefore, we accept the H₀ and can conclude that the variable “sspre” follows a normal distribution.

The statistical tests to be used will depend on how the researcher recorded the variables in the study. Following with the example and the database used in the present paper, the researcher can face different scenarios based on the exposure and outcomes variables:

1.–Exposure and Outcomes (dichotomous or binary variable). When the exposure and the outcomes are binary variables, we have to create a crosstab table in which the exposure or independent variable (IV) is located in columns, and the outcomes or dependent variable (DV) in rows. The proportions in each column are compared by the chi squared (χ²) or by the Fisher exact statistics tests, and the clinical relevance of association is assessed by the proportion rate (PR) and odds ratio (OR).[25] For example, association between severe level of CTS symptoms (NivSS) and smoking habit: two proportion comparison, PR or OR > 1 means positive association, PR or OR = 1 constitutes the H₀ (no association), and PR or OR < 1 means negative association ([Fig. 8]) ([Chart 7]). The proper use of OR or of PR will depend on the clinical design of the study. In a cross-sectional study ([Chart 8]), the results come in terms of prevalence difference (0.215686), OR of prevalence (3.75), and prevalence ratio (1.3235), with the 95% CI. For cohort studies or experimental studies ([Chart 9]), the results come in terms of risk difference (0.215686), relative risk or risk ratio (RR) (1.3235), and OR (3.75), with the 95% CI. However, for case-control studies, the results come only in terms of OR ([Chart 9]). Observe that the results obtained by Stata in [Charts 8], [9] and [10] are similar to those shown in [Fig. 8] done by hand calculator.

2. Exposure (categorical variable) and Outcomes (dichotomous or binary variable). When the exposure variable presents several categories (c > 2) and the outcomes variable is binary, we have to check if the IV is an ordinal categorical variable, because in that case we have to assess the tendency by a trend test or by a Mantel Haenszel test and by the deviation from linearity test. If we look at [Fig. 8] (association between NivSSpre and smoking level: comparison of several proportions), the proportions in columns of NivSSpre (severe level of symptoms related to CTS) is increasing with the number of cigarettes/day (c/d) (smoking level). But the linear trend analysis was not significant ([Chart 10]). Otherwise, if we do not have an ordinal aspect of the categories, the statistics test will be again a χ² test for multiple comparisons or by regression model[18] [25] ([Chart 10]), which showed that there was not an association between severe level of symptoms and smoking level.

3. Exposure (binary variable) and Outcomes (quantitative variable). When the IV is a binary variable (smoking habit) and the outcomes is quantitative (SSpre = severity of symptoms related to CTS), the statistics test compares two means by the Student-Fisher t-test and the magnitude of the effect by the 95% CI ([Fig. 8]). Apart from the test for normality distribution of the sample when the sample size is not large (n < 30), we need to test for homogeneous variances or perform the Levine test, which establishes the H₀ that the 2 samples compared presented an equal variance. If the Levine test is significant, we have to use the Student t-test for unequal variance, in which one only changes the dfs.[26] Besides, we have to differentiate between the Student t-test for independent samples, as in this case (comparison of sspre in smoking and non-smoking samples) ([Fig. 8]), or in the example in [Fig. 3] and [Chart 6] (comparison of sschange in a ligament reconstruction sample with only a CT release sample); and a paired Student t-test for dependent samples (for example, the comparison of sspre with sspost to know if there was a significant improvement in the symptoms between before the surgery and 3 months after the surgery).

When the sample is small or not normally distributed, the non-parametric tests used for comparing two quantitative variables are the Wilcoxon signed-rank test, in cases of dependent samples, and the U-Mann Whitney test (T Wilcoxon test), in cases of independent samples.

4. Exposure (categorical variable) and outcomes (quantitative variable). When the IV is a categorical variable with more than 2 categories (smoking level) and the DV is a quantitative variable, the analysis of variance (ANOVA) is used, defining the reference category for the analysis (that is, association between SSpre and smoking level [cigarettes/day]) ([Fig. 8]).[26] The ANOVA is a generalization of the Student-Fisher t-test in which it is assumed that the data of “c” categories are random samples of “c” populations with equal variance and normal distribution if some of the samples present a size < 30. Consequently, it is mandatory for ANOVA to test the samples for normality (Shapiro-Wilk Test) and for homogeneity of the variances (Levine Test). A non-parametric version of ANOVA is the Kruskal-Wallis test. The ANOVA can be analyzed using a linear regression model ([Chart 11]).[26]

5.–Exposure and Outcomes (quantitative variables). When both variables are quantitative, the linear regression analysis is the selected test. A linear equation (y = A + BX) is calculated, in which the slope, or β coefficient, gives us information about the contribution of the smoking level (cigarettes/day) to the severity of the symptoms related to CTS (sspre) ([Fig. 8]).[26]

6. Exposition (quantitative variable) and Outcomes (categorical variable). In the final part of [Fig. 8], the variable “outcomes” is categorical. In the present case, it is binary (NivSS = severe level of symptoms), and the exposition is quantitative (smoking: number of cigarettes/day); the statistical test will be a logistic regression.[26]

Fig. 8 Statistic tests based on the exposure and outcome variables. Comparison of two proportions when the exposition, and the outcomes are binary variables. Comparison of several proportions when the exposition is categorical, and the outcome is a binary variable. Comparison of two means when the exposition is binary, and the outcome is a quantitative variable. Comparison of several means when the exposition is categorical, and the outcome is a quantitative variable. Simple linear regression when both (exposition & outcome) variables are quantitative. Logistic regression model when the outcome is binary, and the exposition is quantitative. (based on Domenech[26])

Chart 7
The association between severe level of carpal tunnel syndrome symptoms and smoking habit

(A) Comparison of two proportions using the Z statistic test, which in acceptable sample sizes follows a normal distribution (Wald test). (B) The same comparison using the chi squared (χ²) test. Observe that Z² = 1.412² is equal to the Pearson statistic χ²= 1.9943. The proportions of severe symptoms in smoking (0.88253) and in non-smoking (0.666667) are similar to those observed in the χ² cross-tab describes as risk. The risk differences in cross-tab 0.215683 are similar to the one shown in the Z test (diff). (C) The odds ratio (OR) in the logistic regression analysis is similar to the OR observed in B, with a similar likelihood ratio (χ² = 1.98).

Chart 8
Measuring of the association between NivSSpre and smoking habit for a cross-sectional clinical design

Abbreviation: NivSSpre, severe level of symptoms before surgery. *recommended confidence interval (CI).

Observe that the results in a cross-sectional study come in terms of prevalence, prevalence difference, prevalence ratio (PR) and odds ratio (OR) of prevalence. For the PR and OR, the H₀ = 1, and the intervals include the H₀, consequently, the association was not significant.

Chart 9
Measuring of the association between NivSSpre and smoking habit for a cohort and experimental clinical design

Abbreviation: NivSSpre, severe level of symptoms before surgery. *recommended confidence interval (CI).

For experimental and cohort clinical designs, the results are shown in the same way, based on risk ratio (RR) and odds ratio (OR). However, in case-control studies, the results of the same analysis come only in terms of OR.

Chart 10
Association between smoking level (cigarettes/day) and severe level of carpal tunnel syndrome symptoms

(A) Observe that the prevalence of severe level of symptoms before surgery (NivSSpre) is increasing with the level of smoking. The deviation from linearity test presented a non-significant result (p = 0.7171), which implies that the proportions are located in a straight line. The Mantel Haenszel (MH) test showed that the prevalence increase of symptoms with the smoking level was not significant (p = 0.1574), which means that there is no linear trend because the line is horizontal and not ascending. (B) The same analysis of association when there is not an ordinal categorization. (C) The same analysis with similar odds ratios using a logistic regression model.

Chart 11
Association between the symptoms related to carpal tunnel syndrome (sspre) and smoking level (SmokLevel)

(A) The analysis of variance (ANOVA) for assessing the association between a quantitative outcome variable (sspre) and a categorical exposition variable (NivTab). (c/d = cigarettes/day.) The results demonstrated that there was no association. The contrast analysis assessed the association sspre in each group of NivTab, using as reference the “no smoking” group. (B) The ANOVA analysis using regression model. Observe that the F statistics is the same (0.82), that the p-value is the same (p = 0.4535), and that the t-values (0.31; 1.26) and the difference mean values (0.1416667; 0.4734848) in the contrast analysis are coincidental with those in the regression model for the β coefficients.

Generalized Linear Models (GLMs)

We have studied the association between symptoms related to CTS and smoking. This association has been analyzed in different ways based on the type of exposure and outcome variables. In [Chart 10], we observed a non-significant association between the severe level of CTS symptoms (NivSSpre) and smoking level (SmokLevel; c/d), with an OR = 2.5 (1–15 c/d versus no smoking); and an OR = 5 (> 15 c/d versus no smoking) in the cross tab analysis. The questions are: are those results the real effect of smoking on the severity level of the symptoms? Is there another variable which could affect this association? The same applies for the example exposed in [Chart 6], the effect of ligament reconstruction on the change in CTS symptoms in open CT release. Using regression models, the observed effect can be adjusted to modifiers (confounder and/or interaction) variables. The decision of adjusting for a confounder variable in a regression model should not be taken based on significant statistical test, but on the basis of important changes in the effect.

The analysis of the association between severe level of symptoms and smoking level (cigarettes/day) ([Chart 10]) demonstrated that there was no significant association based on the Mantel-Haenszel trend analysis and on the χ² test. The OR₁ = 2.5 (95% CI: 0.2136439 to 29.25428) and OR₂ = 5 (95% CI: 0.4625826 to 54.0444) were not significant because the 95% CI included the H₀ = 1. A similar analysis using, in this case, a logistic regression model, allows us to include, in the model, more variables that can affect the effect of smoking level on the level of severity of the symptoms. If we observe [Chart 12], we can find that the same model adjusted by gender increased the effect (OR) > 10%, which is clinically important, despite that the adjusted ORs (OR₁ = 4.966647; OR₂= 13.60615) are still non-significant. Consequently, the association between severe level of CT symptoms and smoking level (cigarettes/day) should be adjusted for gender; and gender constitutes a variable that modified the association between smoking level and severe level of CTS symptoms.[18] [ 26] [ 27]

Chart 12
Advantage of the generalized linear models (GLMs)

(A) Analysis of the association of the severe level of symptoms (LevSSpre) with smoking level (cigarettes per day [c/d]) shown in [Chart 11]. The logistic regression allows the inclusion of modifier variables in the model and to know the association of ssppre and SmoKLev adjusted to gender.

(B) The effect of ligament reconstruction on the change of carpal tunnel syndrome (CTS) symptoms using the regression linear model (shown in [Chart 6]).

The same effect analysis adjusted to the variable “Gender.”

Observe that the % of change in the effect (49.66423% in 1–15 c/d) (63.251912% in > 15 c/d) (11.17793% in no reconstruction) is higher than 10%.

Consequently, the variable “gender” is a modifier variable of the association between severe level of CTS symptoms and smoking level; as well as of the effect of the ligament reconstruction on the change in symptoms related to CTS.

In a similar manner, we can adjust for modifier variables in a different regression model as in the analysis of the effect of ligament reconstruction on the change in symptoms after open CT release ([Chart 6]). The analysis can be done by a Student t-test but using a multiple linear regression model the effect (difference on the mean change of symptoms) increased from 0.3545833 to 0.3992063 in the model adjusted by gender, which constituted a change in the effect by 11.17793%, also considered as clinically important.

The possibility of adjusting the models for different modifier variables is the most important advantage of using regression models to do the same analysis. In 1989, based on the work developed by McCullagh et al,[28] the GLMs were introduced: a set of models constituted by a linear combination of predictor variables (X₁, X₂, X₃, …X_i) that can be a mix of quantitative (continuous and discrete) and categorical variables; and a dependent variable (Y) that can be quantitative (linear regression model), binary (logistic regression model), ordinal (ordinal logistic regression model), nominal (multinomial logistic regression model), count (Poisson, negative binomial, and zero-inflated Poisson regression models), etc. Consequently, the GLMs are a broad class of models that include linear regression, ANOVA, Poisson regression, log-linear models, etc.

Conclusions

The aim of the present update article is to serve as a gentle introduction to data analysis for clinical research in hand surgery. Early steps in statistics are important to improve the quality of our scientific papers. Clinical practice in hand surgery can be improved by good clinical research, and statistics is a fundamental support tool.

Referenzen

References
1 Kleinert HE, Kasdan ML, Romero JL. Small blood-vessel anastomosis for salvage of severely injured upper extremity. J Bone Joint Surg Am 1963; 45-A: 788-796
2 Rosales RS. Clinical research in hand surgery. J Hand Surg Eur Vol 2015; 40 (05) 546-548 Doi: 10.1177/1753193415583624
3 Gummesson C, Atroshi I, Ekdahl C. The quality of reporting and outcome measures in randomized clinical trials related to upper-extremity disorders. J Hand Surg Am 2004; 29 (04) 727-734 , discussion 735–737
4 Tadjerbashi K, Rosales RS, Atroshi I. Intervention randomized controlled trials involving wrist and shoulder arthroscopy: a systematic review. BMC Musculoskelet Disord 2014; 15: 252
5 Rosales RS, Reboso-Morales L, Martin-Hidalgo Y, Diez de la Lastra-Bosch I. Level of evidence in hand surgery. BMC Res Notes 2012; 5: 665
6 Levine DW, Simmons BP, Koris MJ. , et al. A self-administered questionnaire for the assessment of severity of symptoms and functional status in carpal tunnel syndrome. J Bone Joint Surg Am 1993; 75 (11) 1585-1592
7 Atroshi I, Lyrén PE, Gummesson C. The 6-item CTS symptoms scale: a brief outcomes measure for carpal tunnel syndrome. Qual Life Res 2009; 18 (03) 347-358
8 Atroshi I, Lyrén PE, Ornstein E, Gummesson C. The six-item CTS symptoms scale and palmar pain scale in carpal tunnel syndrome. J Hand Surg Am 2011; 36 (05) 788-794
9 Rosales RS, Atroshi I. Spanish versions of the 6-item carpal tunnel syndrome symptoms scale (CTS-6) and palmar pain scale. J Hand Surg Eur Vol 2013; 38 (05) 550-551
10 Rosales RS, Delgado EB, Díez de la Lastra-Bosch I. Evaluation of the Spanish version of the DASH and carpal tunnel syndrome health-related quality-of-life instruments: cross-cultural adaptation process and reliability. J Hand Surg Am 2002; 27 (02) 334-343
11 Rosales RS, Martin-Hidalgo Y, Reboso-Morales L, Atroshi I. Reliability and construct validity of the Spanish version of the 6-item CTS symptoms scale for outcomes assessment in carpal tunnel syndrome. BMC Musculoskelet Disord 2016; 17: 115 Doi: 10.1186/s12891-016-0963-5
12 Atroshi I, Gummesson C, Johnsson R, McCabe SJ, Ornstein E. Severe carpal tunnel syndrome potentially needing surgical treatment in a general population. J Hand Surg Am 2003; 28 (04) 639-644
13 Rosales RS, Diez de la Lastra I, McCabe S, Ortega Martinez JI, Hidalgo YM. The relative responsiveness and construct validity of the Spanish version of the DASH instrument for outcomes assessment in open carpal tunnel release. J Hand Surg Eur Vol 2009; 34 (01) 72-75
14 Doménech JM, Navarro JB. Regresión Lineal Múltiple con predictores categóricos y cuantitativos. 11ª ed. Barcelona: Signo; 2018
15 Rubin DB, Schenker N. Multiple imputation in health-care databases: an overview and some applications. Stat Med 1991; 10 (04) 585-598
16 Fisher RA. On the mathematical foundations of theorical statistics. Philos Trans R Soc Lond A. 1992; 222A: 309-368
17 Fisher RA. Statistical Methods for Research Workers. 5th ed. Edinburgh: Tweeddale Court London. 33 Paternoster Row; 1934
18 Doménech JM. Fundamentos de de Diseño y Estadistica. UD7. Comprobación de hipótesis: Pruebas de significación, pruebas de hipótesis y tamaño de los grupos. 18ª ed. Barcelona: Signo; 2017
19 Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purpose statistical inference, Part I. Biometrika 1928; 20A: 175-240
20 Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purpose statistical inference, Part II. Biometrika 1928; 20A: 263-294
21 Neyman J, Pearson ES. On the problem of the most efficient test of statistical hypotheses. Philos Trans R Soc Lond A. 1933; 231: 289-337
22 Schwartz D. Métodos estadísticos para médicos y biólogos. Barcelona: Herde; 1985
23 Neyman J. On two different aspects of the representative method; the method of stratified sampling and the method of purposive selection. J R Stat Soc 1934; 97 (04) 558-625
24 Ozyürekoğlu T, McCabe SJ, Goldsmith LJ, LaJoie AS. The minimal clinically important difference of the Carpal Tunnel Syndrome Symptom Severity Scale. J Hand Surg Am 2006; 31 (05) 733-738 , discussion 739–740
25 Doménech JM. Fundamentos de de Diseño y Estadística. UD9.Comparación de dos proporciones. Medidas de asociación y de efecto. 18ª ed. Barcelona: Signo; 2017
26 Doménech JM. Análisis estadístico de un estudio sobre Tabaco y Carboxihemoglobina. 4 ed. Barcelona: Signo; 2017
27 Kleinbaum DG, Kupper LL, Muller KE, Nizam A. Applied Regression Analysis and other Multivariate Methods. 4th ed. Pacific Grove (CA): Duxburry Press; 2008
28 McCullagh P, Nelder JA. Generalized linear model. 2nd ed. London: Chapman and Hall; 1989

Abbildungen

Fig. 1 Original database in Excel worksheet.

Fig. 2 Descriptive statistics of the quantitative variable “sspre.” From left to right: Graphic P-P for normal distribution (observe that the data are almost coincidental with the line); box plot (the box delimitated by the p25 and p75 quartiles, and the median, the whiskers are the minimum and maximum); and the stem & leaf diagram. Descriptive statistics showed data with values higher than the maximum, and the normality tests demonstrated that the data for the sspre variable followed a normal distribution (p > 0.05). Abbreviations: sspre, carpal tunnel syndrome symptoms at the baseline before surgery.

Fig. 3 Box plot of sspre and sspost variables. Observe the change in the severity of the CTS symptoms between before and after the surgery. Notice that the box plot of sspost presents a positive asymmetry. Abbreviations: CTS, carpal tunnel syndrome; sspost, severity of symptoms related to CTS 3 months after surgery; sspre, severity of symptoms related to CTS before surgery based on the CTS-6 scale).

Fig. 4 Descriptive statistics of categorical variables. Cross tabulation of severe level of CTS symptoms (NivSSpre) and smoking level (SmokLevel) and bar graph. Abbreviations: CTS, carpal tunnel syndrome.

Fig. 5 Comparison of two means. Inference statistic. Note: (µA ; σA) = mean and SD in population A; (µB ; σB) = mean and SD in population B;

_A; S_A = mean and SD in sample A;

_B; S_B = mean and SD in sample B; SE = standard error. (based on Domenech[18] [26]). Abbreviations: SD, standard deviation.

Fig. 6 Change in CTS symptoms in ligament reconstruction vs open CT release. Interpretation of the results using 95% CI (diff Mean = 0.3545833; 95% CI = - 0.5167852 to 1.225952). The H₀ is included in the 95% CI (results not significant). The criterion is included in the CI (inconclusive CI). Criterion of 1 point in the CTS-6 questionnaire, which measured the severity of the symptoms related to CTS. (based on Atroshi et al[8] and Ozyürekoğlu et al[24]). Abbreviations: CI, confidence interval; CT, carpal tunnel; CTS, carpal tunnel syndrome.

Fig. 7 Comparison of the results of different confidence intervals. Interpretation of the results based on the 95% CI and a criterion (δ*) or minimal important clinical effect. (based on Domenech[18]). Abbreviations: CI, confidence interval.

Fig. 8 Statistic tests based on the exposure and outcome variables. Comparison of two proportions when the exposition, and the outcomes are binary variables. Comparison of several proportions when the exposition is categorical, and the outcome is a binary variable. Comparison of two means when the exposition is binary, and the outcome is a quantitative variable. Comparison of several means when the exposition is categorical, and the outcome is a quantitative variable. Simple linear regression when both (exposition & outcome) variables are quantitative. Logistic regression model when the outcome is binary, and the exposition is quantitative. (based on Domenech[26])