Subscribe to RSS
DOI: 10.1055/s00381675587
Basics of Statistics for Clinical Research in Hand Surgery
Estadísticas básicas para la investigación clínica en cirugía de manoAddress for correspondence
Publication History
03 October 2018
11 October 2018
Publication Date:
07 December 2018 (online)
 Abstract
 Resumen
 Introduction
 Before Starting
 Example and Data Matrix
 Descriptive Statistics
 Inference Statistics
 Statistics Test for Data Analysis
 Generalized Linear Models (GLMs)
 Conclusions
 References
Abstract
Statistics, the science of numerical evaluation, helps in determining the real value of a hand surgical intervention. Clinical research in hand surgery cannot improve without considering the application of the most appropriate statistical procedures. The purpose of the present paper is to approach the basics of data analysis using a database of carpal tunnel syndrome (CTS) to understand the data matrix, the generation of variables, the descriptive statistics, the most appropriate statistical tests based on how data were collected, the parameter estimation (inference statistics) with pvalue or confidence interval, and, finally, the important concept of generalized linear models (GLMs) or regression analysis.
#
Resumen
La “Estadística” es la ciencia de la evaluación numérica que nos permite determinar el valor real de las intervenciones quirúrgicas. La Investigación Clínica en Cirugía de la mano no puede mejorar sin considerar la correcta aplicación de los test estadísticos. El objetivo del presente trabajo de actualización es abordar los principios de los análisis estadísticos usando una base de síndrome de túnel carpiano (STC) para entender la matriz de datos, la generación de variables, la estadística descriptiva, los test más apropiados en base a como han sido recolectada la información, la estimación de parámetros (inferencia estadística) mediante valores “p” o intervalos de confianza, y finalmente, la introducción de los conceptos de “Modelos L ineales Generalizados”.
#
Palabras clave
investigación clínica  estadística  síndrome del túnel carpiano  metodología  manoIntroduction
Hand surgery has experienced important technical advances after the first description of smallartery repair in the digits by Harold Kleinert in 1963.[1] Developments include digital replantation, toetohand transfers, free flaps for hand reconstruction, hand transplantation, and even improvements in nerve microsurgery. However, the quality of our clinical research is still fair.[2] The quality of the reporting in randomized controlled trials (RCTs) is still poor, with a median score of 2 points in the modified Jadad scale, which classifies the quality of RCTs from 0 (worst quality) to 5 (best quality).[3] Despite the fact that the practice of wrist and shoulder arthroscopy started almost at the same time, the efficacy of arthroscopically performed wrist interventions has only been studied in four RCTs; the median modified Jadad score for the wrist RCTs was 0.5 (range 0–1), compared with 3.0 for the 50 RCTs of significantly higher quality assessing interventions performed through shoulder arthroscopy.[4] More than 80% of the papers published in The Journal of Hand Surgery European Volume (JHSE) and 68% of those published in The Journal of Hand Surgery American Volume (JHSA) presented levelIV evidence (that is, mostly case series studies). The percentages of published papers with the highest level of evidence, including highquality RCTs and systematic reviews (SRs), in the JHSE was 0.9% (level I) and 5.0% (level II), and in the JHSA, they were 8.3% (level I) and 10% (level II), which is very low.[5]
Hand surgery, as a scientific discipline, should require an appropriate and systematic analysis of all variables to demonstrate the significance of the observations that the surgeons face every day in their clinical practice. There are two approaches to this issue: the first one is to use the experience of the hand surgeon to answer questions that arise in daily practice. The second one is to complement the knowledge acquired after years of practice with a scientific evaluation of the observations reached by clinical research based on statistics, the science of numerical evaluation that can thoroughly help determine the real value of a hand surgical intervention. We cannot conduct good clinical researches in hand surgery without considering the application of the most appropriate and accepted statistical procedures.
The purpose of the present paper was to approach the basics of data analysis using a database of carpal tunnel syndrome (CTS) to understand the data matrix, the generation of variables, the descriptive statistics, the most appropriate statistical tests based on how we collect the data, the parameters estimation (inference statistics), the appropriate use of the pvalue and confidence interval (CI), and, finally, the important concept of generalized linear models (GLMs) or regression analysis.
#
Before Starting
The main part of clinical research starts with the project before the statistical analysis. Based on our observations in clinical practice followed by a systematic review of the scientific literature, we can establish a research question. After that, we have to select the study population or sample for answering our question, the study design (case series, crosssectional study, casecontrol study, cohort study, experimental study), and the appropriate instruments and measurements.[2] Finally, we can start the data analysis.
#
Example and Data Matrix
1–Study Description
In this part, we present a study using part of a database of CTS originating from an experimental clinical design or RCT designed to analyze the effect of reconstructing the transverse carpal ligament on the outcomes of open carpal tunnel (CT) release. The researcher should preestablish the variables to be analyzed based on the clinical design. The teaching purposes in the present study were to demonstrate the following hypotheses:
Hypothesis 1: Smokers have worse symptoms related to CTS than nonsmokers.
Hypothesis 2: Reconstruction of the transverse carpal ligament improves the outcomes of open CT release.
2–Generating the original matrix data
In order to generate the matrix data, it is important to use a welldesigned form to collect the data of the variables, such as Access software (Microsoft Corp., Redmond, WA, USA), or to use a second blinded check of the data entry, in case the researcher has used directly the Excel (Microsoft Corp., Redmond, WA, USA) worksheet for collecting the data. Otherwise, problems with the missing values will be encountered, which will also be approached in the present paper.
In [Fig. 1], we can observe the initial matrix data in an Excel worksheet with the following variables: “id”, “sex”, “smoking” (cigarettes/day [c/d]), “sspre” (severity of symptoms related to CTS before surgery based on the CTS6 scale), “sspost” (severity of symptoms related to CTS 3 months after surgery), and “reconstruction”. The CTS6 scale is the shorter version of the CTS questionnaire developed by Levine et al in 1993.[6] The CTS6 scale was developed by Atroshi et al[7] [8] and measures the severity of symptoms related to CTS. In the present paper, the Spanish version of the CTS was used,[9] which was developed from the extended Spanish version of the CTS questionnaire,[10] with a good level of reliability and validity for measuring outcomes in patients with CTS.[11]
3–Preparing the data for the analysis: generating and labeling variables
Once we have the main variables and the matrix data, the Excel worksheet can be imported by most statistics computer software. In the present paper, we have used Stata version 14.5 (StataCorp, College Station, TX, USA), and all the charts and graphics have been created using the same software. The statistical techniques for analyzing a matrix data will vary based on the measurement scale (quantitative vs categorical) of the variables. For teaching purposes, in the present paper, we have generated different categorical variables from the variables “smoking,” “sspre” and “sspost” to demonstrate the use of different statistical tests. In that way, we have generated the following categorical variables: “HabitSmoke” (Smoking habit), “SmokLevel” (cigarettes/day), and “NivSSpre” (severe level of symptoms before surgery), based on the potential need for surgery, which was defined by Atroshi et al[12] as a CTS symptom severity score of 3.2. We have also generated a new quantitative variable, “sschange”, which measures the change in the severity of symptoms between before and after the surgery, based on the variables “sspre” and “sspost”[13]. The software enables us to recode the original string variable “reconstruction” ([Fig. 1]) to a numerical binary variable: reconst (reconstruction; “0” = No, “1” = Yes). Observe that the statistical software was able to detect the missing values shown as “.”, which will be important for the analysis ([Chart 1]).

Observe the new variables generated as the categorical variables: “HabitSmoke” (Smoking habit “0” = NO, “1”= Yes), “SmokLev” (Level of smoking cigarettes/day; “0”= No, “1”= 1–15, “2”= >15), “NivSSpre” (severe level of symptoms before surgery, “0” = No, “1” = Yes), “NivSSpos” (severe level of symptoms after surgery, “0” = No, “1” = Yes) . A new quantitative variable “sschange”, which measures the change in the severity of the symptoms between before and after the surgery, was generated from the variables “sspre” and “sspost.” Observe that the initial nominal variable “RECONSTRUCTION” in the original matrix data from excel ([Fig. 1]) has been recoded to a numerical categorical variable: Reconst (Reconstruction; “0” = No, “1” = Yes). Observe that the statistical software was able to detect the missing values represented by a “.”, which will be important for the analysis.
An important issue is converting a quantitative variable to a categorical variable. It is generally not recommended to convert a quantitative variable, such as “smoking” and “sspre”, into a categorical variable because it results in loss of information in the analysis and in the conclusions. This is done here for teaching purposes.
The management of the variable “age” should be appropriate. Usually, we ask the patients about their age and they give us the truncated age (45, 56, 34 years old, etc.). A more correct way is to record the exact time (age, followup, etc.). For this purpose, we record the date of birth and the date of entry into the study, and we can calculate “Age” (exact time in years) = (birth date – entry date)/365.25, obtaining the exact “age” variable with decimals. If we use the truncated age, we need to do a correction by adding 0.5 to the truncated age. This correction of 0.5 does not change the dispersion of the data or the standard deviation (SD), but only the mean ([Chart 2]).

Observe the mean and standard deviation (SD). The correction of + 0.5 point in the variable Age only changes the mean but not the SD. The dispersion of the data in the variable AgeC is the same as that of the variable Age.
4–Handling missing values
The amount of missing values is an index of the quality of the study and of the data collection process in the research. A common mistake in the analysis is to exclude all study participants with missing values (observed not only among young researchers in hand surgery: “If we have enough patients in our study, just go ahead and delete all those with missing values”). Exclusion of the individuals with missing values could be a big problem. If the individuals excluded because of missing values constitute a random subsample of the total study population, the only problem will be decreased power and precision of the analysis. However, when the missing values are related to the values of some of the variables of the study, then we have a more important methodological problem (bias in the parameter estimation process in a multivariate model, for example). Based on the origin of missing values, Stata divides the missing values into: system missing (data without value, values incoherent with the variable format, and impossible values in a data transformation, such as 5/0); and user missing or extended missing values; for example, if a questionnaire item asks about smoking (Yes or No) and the next item asks about the number of cigarettes per day, it is expected to have an empty value or a value of 99 in nonsmoking respondents, and it should be assigned as user missing. Based on the degree of possible bias, missing system values can be classified into: missing completely at random (MCAR), such as forgetting to record some data; missing at random (MAR), when the missing value is related to an independent confounder variable (X). For example, when studying whether losing weight could improve the symptoms of CTS, a high number of missing values was observed among younger individuals because younger subjects have less tendency to cooperate in the registration of their habits; and nonrandom missing (NRM), when it is related to the dependent variable (Y). For example, when analyzing the relationship between high blood pressure (HBP) and age group, it is expected that the individuals with HBP and older people present a better cooperation and consequently a lower number of missing values for these variables.
If we look at our data ([Chart 1]), we have some missing values. The first step before starting our study is to analyze the missing values; probably, they are MCAR. The missing value analysis estimates and describes those values ([Chart 3]), and it can give information about: total number and percentage of missing values, number of patients with missing values, number of missing values per patient, and the patterns of missing values. At the same time, most of the software generate a new variable called _Nmiss, or similar to it, which can be used to drop the individuals with missing values or to do other analyses to get more information about the missing values.

Observe that the total number of missing values was 11. Two missing values in variables: AgeC, Smoking, HabitSmoke and NivTab. Three missing values were observed in Gender. Only 28 patients presented (90.32%) missing values = 0. Two patients (6.45%) presented five missing values, and only one (3.23%) presented one missing value.
Most of the statistics software have two different approaches for data analysis with missing values: pairwise (using the maximum number of subjects available for each variable) and listwise (using only subjects with valid values) selection. For example, in multivariate linear regression analysis, Stata always uses listwise selection in the analysis of the model[14] ([Charts 4A] and [4B]).

Observe that the means, standard deviation (SD), and number of observations change based on the strategy of approach of the missing values. In Listwise selection, the software only uses the patients with valid values, which means that Nmisss = 0.
There are no rules about the accepted level of missing values. If we have more than 10% of missing values, it is recommended to perform descriptive statistics on the variables of the subjects with complete valid values and those with some missing values. Most of the statistics software have some statistical methods to reduce missing values, such as mean imputation and multiple imputations (used in multivariate analysis).[15] The mean imputations process ([Chart 5]) assigns the mean of the variable to the missing value. Consequently, we will obtain a decreased SD, but no changes in the mean of the variable by the imputation.[14]

Observe that after mean imputation (AgeMean), the mean remained with the same value; however, the standard deviation (SD) decreased.
#
Descriptive Statistics
After generating, recoding, labeling and handling missing values in the variables, we have to check the data. We have created a dataset, and now we need to check the work we have done defining the dataset. Checking the accuracy of our data entry is also our first statistical look at the data. Data checking is very important to detect possible errors in data entry.
The description of the data constitutes the starting point of our analysis and sometimes a proper objective of a study. Descriptive statistics are based on the calculation of several indexes and graphics, which give us information about the distribution of the data (central tendency, dispersion, symmetry, etc.) for every variable.
1. Description of quantitative variables.
We have two types of statistical indexes for describing quantitative variables: measures based on momentum (mean, SD, variance, skewness and kurtosis) and indexes based on ordination (median, quartiles, minimum, and maximum).
The mean represents the gravity center of the distribution in a specific variable and describes the central tendency if we do not have asymmetry. The SD (S) is the square root of the variance (S^{2}), which is obtained dividing the sum of squares by the degree of freedom (df) (S^{2} = SS/df). The SD (S) is interpreted only if the distribution is normal and if it presents symmetry. In this case, the interval mean ± 1 SD contains 68% of the observations in that variable, the interval mean ± 2SD includes 95% of the observations, and the interval mean ± 3SD includes 99.7% of the cases.
Other measures based on moments are skewness and kurtosis. Measures based on ordination are the median and quartiles. Those measures are not affected by the asymmetry of the distribution. The median and the mean are equal when the distribution is symmetric. The symmetry is measured by the skewness index, which will be positive when the mean is higher than the median, and negative when the mean is lower than the median. The degree of flattening of the distribution is measured by the kurtosis index (in Stata, kurtosis > 3 implies a distribution with more sharpness than a normal distribution). In [Fig. 2], we can observe the PP normal graph, the box plot and the stem and leaf diagram which give an image of the distribution of the data in the sspre variable. The stem and leaf diagram allows us to calculate in an easy way the median and the Q_{1} = percentile 25 and Q_{3} = percentile 75 quartiles. The Q_{2} = percentile 50 is equal to the median. Besides, we can observe in the box plot that is very useful to see the distribution of the data created by the median and the Q1 and Q3 quartiles, forming the box and the minimum and maximum, which form the whiskers. Box plots can be used to see the symmetry, outliers, and to demonstrate change in a variable such as symptoms related to CTS if we compare the graph plot of the sspre with that of the sspost and see the improvement in symptoms after surgery ([Fig. 3]). The descriptive statistics alerted us that there were errors during data entry because we could find values > 5 in [Fig. 2] (max = 5.4), and, in [Fig. 3], the upper whiskers of the box plots of sspre and sspost were > 5, which is the maximum score that any item of the CTS6 questionnaire can achieve. Consequently, the researcher should check the original data entry, even the original questionnaires, to overcome those mistakes. Stata can give us the Id number with sspre and sspost with values > 5 (list Id if sspre >5 & sspre < .  sspost >5 & sspost < .), which were Id: 2, 3, 19, and 30. If the researcher could not access the original matrix or questionnaires or if it is impossible to know the exact values of those variables in those individuals, the researcher should assign a missing value to those entries. The matrix data of the present paper was created for teaching purposes, but the rest of the analyses were done using the correct values in the database
2. Descriptions of categorical variables
Categorical variables (such as “gender”, “HabitSmoke”, “SmokeLevel”, etc.) are described by creating a table of frequencies that classifies the individuals based on the category and calculates percentages ([Fig. 4]).
The importance of the descriptive statistics is checking our data. The descriptive statistics constitute the main possibility for discovering errors in our data. The main checking list during the descriptive statistics will be:

Id. Check that all variables present an Id number and that there is no duplication of the number.

Categorical variables. Check that the values belong to the set of valid values (e.g., observe values different from 0 or 1 in the “gender” variable)

Quantitative variables. Check that the values are included in the interval that defines the valid values (for example, the sspre variable ranges from 1 to 5, and if the summarized information of the variable detects values over 5, it indicates an error in the collection or in the entry of the data).

Dates. Check that the dates are correct if the researcher has used them for generating variables as “age” or “followup” time.

Consistency between variables. Check that the values of a variable are consistent with others (for example, if we observe negative values in “age”, it would indicate that the date of birth is posterior to the date of entry in the study).
If we detect errors during the checking process of the descriptive statistics:

Replace the erroneous data with correct data if the researcher has the original records and information or access to the subjects of the study.

Replace the erroneous data with missing values if the researcher does not have the correct information.
#
Inference Statistics
Decisionmaking in clinical research implies testing whether our hypotheses are true or false based on empiric results obtained in samples of individuals. The results in a specific research or study are based on the measurements made in one of the infinite samples that can be obtained from the reference population. Consequently, the value of our measurements can vary due to chance. Inference statistics assume that the random variability of sampling follows known laws and allows the researchers to quantify that variability and to facilitate decisionmaking about the hypotheses established and to draw conclusions.[14]
1. Pvalue and significance test of the null hypothesis (H_{0}).
Fisher[16] [17] established in 1922 the significance test for null hypothesis. For example, our clinical observations allow us to suppose that the reconstruction of the transverse carpal ligament during open CT release could not improve the severity of the symptoms related to CTS. To demonstrate that hypothesis, a sample of 31 individuals was randomized to receive only open CT release (sample A: no reconstruction, n_{A} = 16) versus open CT release plus ligament reconstruction (sample B: reconstruction, n_{B} = 15). The severity of symptoms score ranged from 1 to 5. Group A presented a mean change in the severity of symptoms score between before the surgery and after the surgery (sschange variable) of _{A} = 1.48 (improvement in symptoms); and a SD of S_{A} = 1.33. Group B presented a mean of _{B} = 1.13; and a SD of S_{B}= 1.004. This hypothesis entails that there is a population A: “change in the severity of symptoms after open CT release”, and a second population, B: “change in the severity of symptoms in open CT release plus ligament reconstruction.” The researcher wants to know if the mean in population A (µ_{A}) is higher than the mean in population B (µ_{B}). However, the researcher does not know these values. To know whether or not the reconstruction improved the “sschange,” the researcher needs a reference distribution to assess if the observed difference in “sschange” between groups A and B (d_{0} = 0.35) ([Fig. 5]) is true or caused by the random fluctuation of the sample. Still, the researcher does not know the magnitude of the difference (δ) between the means of “sschange” in the populations; consequently, this hypothesis cannot be demonstrated. A second option is that the researcher can formulate the null hypothesis that the reconstruction does not improve “sschange” (H_{0}: δ = µ_{A}  µ_{B} = 0). Now, it is possible to establish a reference distribution around the H_{0}, because it is a very specific hypothesis (δ = 0). The reference distribution will be the sample distribution of the observed difference with a mean equal to 0 (µ = 0) and with a SD that will be the standard error (SE_{d}) of the difference, which is equal to the sum of the SE of the samples A and B .[18] The SDs of the real populations are not known, but if we assumed that difference of the samples presented a normal distribution, we could exchange them by the SD of the samples. From here, we can calculate the t_{0} statistic (dividing d_{0} by the SEd). Finally, we want to know the probability of finding a value of T similar or higher to t_{0}, taking into account the degree of freedom (df), which is equal to a total number of individuals (n = 31) minus 2 (2 groups) (df = 29). The obtained probability is higher than 0.05, which means that the difference between groups is compatible with the null hypothesis H_{0} [18]. Consequently, reconstruction of the transverse carpal ligament does not improve the change in the severity of the symptoms after CT release. In this example, given the teaching purpose of this paper, we have not taken into account the sample size, the power of analysis and the type II error, which will be explained below.
2. Null hypothesis (H_{0)} versus alternative hypothesis (H_{a}). Alpha (α) and Beta (β) Risks
In 1928, Neyman et al,[19] [20] [21] based on the Fisher significance test, developed the hypothesis testing under a different point of view, apparently very similar to the one proposed by Fisher but totally different in that Neyman et al. They started with the probability theory to establish a rule for decisionmaking between two hypotheses that were complementary (H_{0} = null hypothesis and H_{a} = alternative hypothesis). That theory constituted a frontal opposition to the inference proposed by Fisher. Neyman et al proposed the α and β risks and the type I and type II errors.
Type I error happens when H_{0} is true and it has been rejected because the risk used was very high. Normally, in clinical research, we use a risk of type I error of 0.05 (α =0.05). The interpretation of α =0.05 is very important. Alpha risk represents the conditional probability of being wrong when the null hypothesis is true. When a researcher rejects the H_{0} with a risk α of 0.05, it does not mean that the researcher is wrong 5 out of 100 times, because that would only be true if the null hypothesis were always true.[22]
Type II error occurs when the H_{a} is true, and the researcher accepted the H_{0} because the test was not significant due to a small sample size. The probability of type II error is called Beta risk (β). The complementary probability 1β represents the probability of accepting the H_{a} when H_{a} is true, and it is called power of the test. Usually, in health sciences or in clinical research, we use a power of 80%.
3. Confidence interval versus pvalue.
In 1934, Neyman,[23] based on the hypotheses test, proposed a new method to answer questions such as the one established in the CTS and ligament reconstruction study explained before ([Fig. 5]), based on Domenech.[18] A good alternative to the hypothesis testing is the CI around a mean. If the samples are representative of the reference populations, we can calculate an interval that could be considered the set of all hypotheses. In that way, one hypothesis located out of the interval is not credible and then, it can be rejected. It is not necessary to know the pvalue to conclude whether the difference, observed in [Chart 6] and [Fig. 5], is significant. If the H_{0} is included in the 95% confidence interval (95% CI), we can conclude that the difference is not significant (mean difference = 0.3545833; 95% CI = 0.5167852 to 1.225952). The interpretation of the 95% CI is: with a confidence of 95%, we can locate the mean difference of “sschange” between 0.52 and 1.23 because the H_{0}, which is =0, is included in the interval in which the difference is not significant ([Fig. 6]).
(A) Observe that the results of the onetailed comparison of the two means is similar to the one observed in [Fig. 5]. Conclusion: the results of the onetailed and twotailed tests were not significant, assuming equal variances and normal distribution of the differences in sschange. (B) Observe the same results (diff of means = 0.3545833 = β coefficient [No reconstruction]) of comparing two means using regression analysis.
This concept, introduced by Newman, was forgotten for a long time. Today, however, it is very common to use 95% CI instead of presenting many pvalues in result charts. The 95% CI has the advantage of giving information about the magnitude of the effect or of an important clinical change. If we already know the minimal important clinical difference (MICD) expected in our outcome variables (such as sschange) in a specific population, we cannot only determine whether the difference observed in the 95% CI is significant, but also assess if that difference (effect) is clinically important. If we look at [Fig. 7], we can see the relationship between the significance and the effect assessed by different 95% CIs. The criterion for MICD on change in the severity of symptoms after CT release, which is ∼ 0.9 to 1, has been measured using the CTS6 PRO instruments,.[8] [24] The first CI (A) is located above the MICD, which means that the 95% CI is significant (it does not include the H_{0}) and presents an important effect. Confidence interval “B” includes the MICD in the interval; however, it does not include the H_{0}; consequently, this 95% CI is inconclusive about the effect; however, it is statistically significant. Confidence interval “C” includes the criterion and the H_{0}, and it constitutes an inconclusive and not significant 95% CI. Confidence interval “D” is located below the criterion, but it does not include the H_{0}; consequently, it is a significant interval, but with a nonimportant clinical effect. Finally, CI “E” is below the criterion and it does include the H_{0}; therefore, this 95% CI is not significant and the effect is not important.[18] In [Fig. 6], the readers can understand, based on the 95% CI of the “sschange” difference between open CT release and open CT release plus ligament reconstruction, that the difference was not significant because it includes the H_{0}, and the CI is inconclusive considering to the magnitude of the effect because it includes the criterion.
Another aspect of the 95% CI is the precision of the interval. A narrower 95% CI implies a better precision because the SE is lower. If we have to select between two 95% CIs that are significant (that is, which do not include the H_{0}) and with an important effect (both CIs above the criterion or of the MICD), we will choose the narrower 95% CI.
#
Statistics Test for Data Analysis
Based on the hypotheses established in the Study Description section of the present paper, we will have different statistical tests for answering those research questions.
Before starting the proper data analysis, we need to explore the sample by the fitgoodness test. An important issue, especially in a sample size < 30, is to assess if the distribution of the data follows a normal distribution in the population. We have different statistical tests and graphics for that purpose. For testing the sample for normality, the most commonly used test is the ShapiroWilk test, with which we can establish the H_{0} = no difference between the sample and a normal distribution. Observe in [Fig. 2] that all tests of normality (ShapiroWilk, skewness and kurtosis tests for normality) were not significant (p > 0.05); therefore, we accept the H_{0} and can conclude that the variable “sspre” follows a normal distribution.
The statistical tests to be used will depend on how the researcher recorded the variables in the study. Following with the example and the database used in the present paper, the researcher can face different scenarios based on the exposure and outcomes variables:
1.–Exposure and Outcomes (dichotomous or binary variable). When the exposure and the outcomes are binary variables, we have to create a crosstab table in which the exposure or independent variable (IV) is located in columns, and the outcomes or dependent variable (DV) in rows. The proportions in each column are compared by the chi squared (χ^{2}) or by the Fisher exact statistics tests, and the clinical relevance of association is assessed by the proportion rate (PR) and odds ratio (OR).[25] For example, association between severe level of CTS symptoms (NivSS) and smoking habit: two proportion comparison, PR or OR > 1 means positive association, PR or OR = 1 constitutes the H_{0} (no association), and PR or OR < 1 means negative association ([Fig. 8]) ([Chart 7]). The proper use of OR or of PR will depend on the clinical design of the study. In a crosssectional study ([Chart 8]), the results come in terms of prevalence difference (0.215686), OR of prevalence (3.75), and prevalence ratio (1.3235), with the 95% CI. For cohort studies or experimental studies ([Chart 9]), the results come in terms of risk difference (0.215686), relative risk or risk ratio (RR) (1.3235), and OR (3.75), with the 95% CI. However, for casecontrol studies, the results come only in terms of OR ([Chart 9]). Observe that the results obtained by Stata in [Charts 8], [9] and [10] are similar to those shown in [Fig. 8] done by hand calculator.
2. Exposure (categorical variable) and Outcomes (dichotomous or binary variable). When the exposure variable presents several categories (c > 2) and the outcomes variable is binary, we have to check if the IV is an ordinal categorical variable, because in that case we have to assess the tendency by a trend test or by a Mantel Haenszel test and by the deviation from linearity test. If we look at [Fig. 8] (association between NivSSpre and smoking level: comparison of several proportions), the proportions in columns of NivSSpre (severe level of symptoms related to CTS) is increasing with the number of cigarettes/day (c/d) (smoking level). But the linear trend analysis was not significant ([Chart 10]). Otherwise, if we do not have an ordinal aspect of the categories, the statistics test will be again a χ^{2} test for multiple comparisons or by regression model[18] [25] ([Chart 10]), which showed that there was not an association between severe level of symptoms and smoking level.
3. Exposure (binary variable) and Outcomes (quantitative variable). When the IV is a binary variable (smoking habit) and the outcomes is quantitative (SSpre = severity of symptoms related to CTS), the statistics test compares two means by the StudentFisher ttest and the magnitude of the effect by the 95% CI ([Fig. 8]). Apart from the test for normality distribution of the sample when the sample size is not large (n < 30), we need to test for homogeneous variances or perform the Levine test, which establishes the H_{0} that the 2 samples compared presented an equal variance. If the Levine test is significant, we have to use the Student ttest for unequal variance, in which one only changes the dfs.[26] Besides, we have to differentiate between the Student ttest for independent samples, as in this case (comparison of sspre in smoking and nonsmoking samples) ([Fig. 8]), or in the example in [Fig. 3] and [Chart 6] (comparison of sschange in a ligament reconstruction sample with only a CT release sample); and a paired Student ttest for dependent samples (for example, the comparison of sspre with sspost to know if there was a significant improvement in the symptoms between before the surgery and 3 months after the surgery).
When the sample is small or not normally distributed, the nonparametric tests used for comparing two quantitative variables are the Wilcoxon signedrank test, in cases of dependent samples, and the UMann Whitney test (T Wilcoxon test), in cases of independent samples.
4. Exposure (categorical variable) and outcomes (quantitative variable). When the IV is a categorical variable with more than 2 categories (smoking level) and the DV is a quantitative variable, the analysis of variance (ANOVA) is used, defining the reference category for the analysis (that is, association between SSpre and smoking level [cigarettes/day]) ([Fig. 8]).[26] The ANOVA is a generalization of the StudentFisher ttest in which it is assumed that the data of “c” categories are random samples of “c” populations with equal variance and normal distribution if some of the samples present a size < 30. Consequently, it is mandatory for ANOVA to test the samples for normality (ShapiroWilk Test) and for homogeneity of the variances (Levine Test). A nonparametric version of ANOVA is the KruskalWallis test. The ANOVA can be analyzed using a linear regression model ([Chart 11]).[26]
5.–Exposure and Outcomes (quantitative variables). When both variables are quantitative, the linear regression analysis is the selected test. A linear equation (y = A + BX) is calculated, in which the slope, or β coefficient, gives us information about the contribution of the smoking level (cigarettes/day) to the severity of the symptoms related to CTS (sspre) ([Fig. 8]).[26]
6. Exposition (quantitative variable) and Outcomes (categorical variable). In the final part of [Fig. 8], the variable “outcomes” is categorical. In the present case, it is binary (NivSS = severe level of symptoms), and the exposition is quantitative (smoking: number of cigarettes/day); the statistical test will be a logistic regression.[26]

(A) Comparison of two proportions using the Z statistic test, which in acceptable sample sizes follows a normal distribution (Wald test). (B) The same comparison using the chi squared (χ^{2}) test. Observe that Z^{2} = 1.412^{2} is equal to the Pearson statistic χ^{2}= 1.9943. The proportions of severe symptoms in smoking (0.88253) and in nonsmoking (0.666667) are similar to those observed in the χ^{2} crosstab describes as risk. The risk differences in crosstab 0.215683 are similar to the one shown in the Z test (diff). (C) The odds ratio (OR) in the logistic regression analysis is similar to the OR observed in B, with a similar likelihood ratio (χ^{2} = 1.98).

Abbreviation: NivSSpre, severe level of symptoms before surgery. *recommended confidence interval (CI).
Observe that the results in a crosssectional study come in terms of prevalence, prevalence difference, prevalence ratio (PR) and odds ratio (OR) of prevalence. For the PR and OR, the H_{0} = 1, and the intervals include the H_{0}, consequently, the association was not significant.

Abbreviation: NivSSpre, severe level of symptoms before surgery. *recommended confidence interval (CI).
For experimental and cohort clinical designs, the results are shown in the same way, based on risk ratio (RR) and odds ratio (OR). However, in casecontrol studies, the results of the same analysis come only in terms of OR.

(A) Observe that the prevalence of severe level of symptoms before surgery (NivSSpre) is increasing with the level of smoking. The deviation from linearity test presented a nonsignificant result (p = 0.7171), which implies that the proportions are located in a straight line. The Mantel Haenszel (MH) test showed that the prevalence increase of symptoms with the smoking level was not significant (p = 0.1574), which means that there is no linear trend because the line is horizontal and not ascending. (B) The same analysis of association when there is not an ordinal categorization. (C) The same analysis with similar odds ratios using a logistic regression model.

(A) The analysis of variance (ANOVA) for assessing the association between a quantitative outcome variable (sspre) and a categorical exposition variable (NivTab). (c/d = cigarettes/day.) The results demonstrated that there was no association. The contrast analysis assessed the association sspre in each group of NivTab, using as reference the “no smoking” group. (B) The ANOVA analysis using regression model. Observe that the F statistics is the same (0.82), that the pvalue is the same (p = 0.4535), and that the tvalues (0.31; 1.26) and the difference mean values (0.1416667; 0.4734848) in the contrast analysis are coincidental with those in the regression model for the β coefficients.
#
Generalized Linear Models (GLMs)
We have studied the association between symptoms related to CTS and smoking. This association has been analyzed in different ways based on the type of exposure and outcome variables. In [Chart 10], we observed a nonsignificant association between the severe level of CTS symptoms (NivSSpre) and smoking level (SmokLevel; c/d), with an OR = 2.5 (1–15 c/d versus no smoking); and an OR = 5 (> 15 c/d versus no smoking) in the cross tab analysis. The questions are: are those results the real effect of smoking on the severity level of the symptoms? Is there another variable which could affect this association? The same applies for the example exposed in [Chart 6], the effect of ligament reconstruction on the change in CTS symptoms in open CT release. Using regression models, the observed effect can be adjusted to modifiers (confounder and/or interaction) variables. The decision of adjusting for a confounder variable in a regression model should not be taken based on significant statistical test, but on the basis of important changes in the effect.
The analysis of the association between severe level of symptoms and smoking level (cigarettes/day) ([Chart 10]) demonstrated that there was no significant association based on the MantelHaenszel trend analysis and on the χ^{2} test. The OR_{1} = 2.5 (95% CI: 0.2136439 to 29.25428) and OR_{2} = 5 (95% CI: 0.4625826 to 54.0444) were not significant because the 95% CI included the H_{0} = 1. A similar analysis using, in this case, a logistic regression model, allows us to include, in the model, more variables that can affect the effect of smoking level on the level of severity of the symptoms. If we observe [Chart 12], we can find that the same model adjusted by gender increased the effect (OR) > 10%, which is clinically important, despite that the adjusted ORs (OR_{1} = 4.966647; OR_{2}= 13.60615) are still nonsignificant. Consequently, the association between severe level of CT symptoms and smoking level (cigarettes/day) should be adjusted for gender; and gender constitutes a variable that modified the association between smoking level and severe level of CTS symptoms.[18] [ 26] [ 27]

(A) Analysis of the association of the severe level of symptoms (LevSSpre) with smoking level (cigarettes per day [c/d]) shown in [Chart 11]. The logistic regression allows the inclusion of modifier variables in the model and to know the association of ssppre and SmoKLev adjusted to gender.
(B) The effect of ligament reconstruction on the change of carpal tunnel syndrome (CTS) symptoms using the regression linear model (shown in [Chart 6]).
The same effect analysis adjusted to the variable “Gender.”
Observe that the % of change in the effect (49.66423% in 1–15 c/d) (63.251912% in > 15 c/d) (11.17793% in no reconstruction) is higher than 10%.
Consequently, the variable “gender” is a modifier variable of the association between severe level of CTS symptoms and smoking level; as well as of the effect of the ligament reconstruction on the change in symptoms related to CTS.
In a similar manner, we can adjust for modifier variables in a different regression model as in the analysis of the effect of ligament reconstruction on the change in symptoms after open CT release ([Chart 6]). The analysis can be done by a Student ttest but using a multiple linear regression model the effect (difference on the mean change of symptoms) increased from 0.3545833 to 0.3992063 in the model adjusted by gender, which constituted a change in the effect by 11.17793%, also considered as clinically important.
The possibility of adjusting the models for different modifier variables is the most important advantage of using regression models to do the same analysis. In 1989, based on the work developed by McCullagh et al,[28] the GLMs were introduced: a set of models constituted by a linear combination of predictor variables (X_{1}, X_{2}, X_{3}, …X_{i}) that can be a mix of quantitative (continuous and discrete) and categorical variables; and a dependent variable (Y) that can be quantitative (linear regression model), binary (logistic regression model), ordinal (ordinal logistic regression model), nominal (multinomial logistic regression model), count (Poisson, negative binomial, and zeroinflated Poisson regression models), etc. Consequently, the GLMs are a broad class of models that include linear regression, ANOVA, Poisson regression, loglinear models, etc.
#
Conclusions
The aim of the present update article is to serve as a gentle introduction to data analysis for clinical research in hand surgery. Early steps in statistics are important to improve the quality of our scientific papers. Clinical practice in hand surgery can be improved by good clinical research, and statistics is a fundamental support tool.
#
#
No conflict of interest has been declared by the author(s).
Acknowledgments
The authors wish to thank Dr. D. Jose M. Domenech, professor of statistics at the Universitat Autònoma de Barcelona. Without his teachings, the present paper could not have been written. The authors also wish to thank Dr. Fernando Corrella, chief editor of the IberoAmerican Journal of Hand Surgery. Without his enthusiasm, this paper would not have been written.

References
 1 Kleinert HE, Kasdan ML, Romero JL. Small bloodvessel anastomosis for salvage of severely injured upper extremity. J Bone Joint Surg Am 1963; 45A: 788796
 2 Rosales RS. Clinical research in hand surgery. J Hand Surg Eur Vol 2015; 40 (05) 546548 Doi: 10.1177/1753193415583624
 3 Gummesson C, Atroshi I, Ekdahl C. The quality of reporting and outcome measures in randomized clinical trials related to upperextremity disorders. J Hand Surg Am 2004; 29 (04) 727734 , discussion 735–737
 4 Tadjerbashi K, Rosales RS, Atroshi I. Intervention randomized controlled trials involving wrist and shoulder arthroscopy: a systematic review. BMC Musculoskelet Disord 2014; 15: 252
 5 Rosales RS, RebosoMorales L, MartinHidalgo Y, Diez de la LastraBosch I. Level of evidence in hand surgery. BMC Res Notes 2012; 5: 665
 6 Levine DW, Simmons BP, Koris MJ. , et al. A selfadministered questionnaire for the assessment of severity of symptoms and functional status in carpal tunnel syndrome. J Bone Joint Surg Am 1993; 75 (11) 15851592
 7 Atroshi I, Lyrén PE, Gummesson C. The 6item CTS symptoms scale: a brief outcomes measure for carpal tunnel syndrome. Qual Life Res 2009; 18 (03) 347358
 8 Atroshi I, Lyrén PE, Ornstein E, Gummesson C. The sixitem CTS symptoms scale and palmar pain scale in carpal tunnel syndrome. J Hand Surg Am 2011; 36 (05) 788794
 9 Rosales RS, Atroshi I. Spanish versions of the 6item carpal tunnel syndrome symptoms scale (CTS6) and palmar pain scale. J Hand Surg Eur Vol 2013; 38 (05) 550551
 10 Rosales RS, Delgado EB, Díez de la LastraBosch I. Evaluation of the Spanish version of the DASH and carpal tunnel syndrome healthrelated qualityoflife instruments: crosscultural adaptation process and reliability. J Hand Surg Am 2002; 27 (02) 334343
 11 Rosales RS, MartinHidalgo Y, RebosoMorales L, Atroshi I. Reliability and construct validity of the Spanish version of the 6item CTS symptoms scale for outcomes assessment in carpal tunnel syndrome. BMC Musculoskelet Disord 2016; 17: 115 Doi: 10.1186/s1289101609635
 12 Atroshi I, Gummesson C, Johnsson R, McCabe SJ, Ornstein E. Severe carpal tunnel syndrome potentially needing surgical treatment in a general population. J Hand Surg Am 2003; 28 (04) 639644
 13 Rosales RS, Diez de la Lastra I, McCabe S, Ortega Martinez JI, Hidalgo YM. The relative responsiveness and construct validity of the Spanish version of the DASH instrument for outcomes assessment in open carpal tunnel release. J Hand Surg Eur Vol 2009; 34 (01) 7275
 14 Doménech JM, Navarro JB. Regresión Lineal Múltiple con predictores categóricos y cuantitativos. 11ª ed. Barcelona: Signo; 2018
 15 Rubin DB, Schenker N. Multiple imputation in healthcare databases: an overview and some applications. Stat Med 1991; 10 (04) 585598
 16 Fisher RA. On the mathematical foundations of theorical statistics. Philos Trans R Soc Lond A. 1992; 222A: 309368
 17 Fisher RA. Statistical Methods for Research Workers. 5th ed. Edinburgh: Tweeddale Court London. 33 Paternoster Row; 1934
 18 Doménech JM. Fundamentos de de Diseño y Estadistica. UD7. Comprobación de hipótesis: Pruebas de significación, pruebas de hipótesis y tamaño de los grupos. 18ª ed. Barcelona: Signo; 2017
 19 Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purpose statistical inference, Part I. Biometrika 1928; 20A: 175240
 20 Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purpose statistical inference, Part II. Biometrika 1928; 20A: 263294
 21 Neyman J, Pearson ES. On the problem of the most efficient test of statistical hypotheses. Philos Trans R Soc Lond A. 1933; 231: 289337
 22 Schwartz D. Métodos estadísticos para médicos y biólogos. Barcelona: Herde; 1985
 23 Neyman J. On two different aspects of the representative method; the method of stratified sampling and the method of purposive selection. J R Stat Soc 1934; 97 (04) 558625
 24 Ozyürekoğlu T, McCabe SJ, Goldsmith LJ, LaJoie AS. The minimal clinically important difference of the Carpal Tunnel Syndrome Symptom Severity Scale. J Hand Surg Am 2006; 31 (05) 733738 , discussion 739–740
 25 Doménech JM. Fundamentos de de Diseño y Estadística. UD9.Comparación de dos proporciones. Medidas de asociación y de efecto. 18ª ed. Barcelona: Signo; 2017
 26 Doménech JM. Análisis estadístico de un estudio sobre Tabaco y Carboxihemoglobina. 4 ed. Barcelona: Signo; 2017
 27 Kleinbaum DG, Kupper LL, Muller KE, Nizam A. Applied Regression Analysis and other Multivariate Methods. 4th ed. Pacific Grove (CA): Duxburry Press; 2008
 28 McCullagh P, Nelder JA. Generalized linear model. 2nd ed. London: Chapman and Hall; 1989
Address for correspondence

References
 1 Kleinert HE, Kasdan ML, Romero JL. Small bloodvessel anastomosis for salvage of severely injured upper extremity. J Bone Joint Surg Am 1963; 45A: 788796
 2 Rosales RS. Clinical research in hand surgery. J Hand Surg Eur Vol 2015; 40 (05) 546548 Doi: 10.1177/1753193415583624
 3 Gummesson C, Atroshi I, Ekdahl C. The quality of reporting and outcome measures in randomized clinical trials related to upperextremity disorders. J Hand Surg Am 2004; 29 (04) 727734 , discussion 735–737
 4 Tadjerbashi K, Rosales RS, Atroshi I. Intervention randomized controlled trials involving wrist and shoulder arthroscopy: a systematic review. BMC Musculoskelet Disord 2014; 15: 252
 5 Rosales RS, RebosoMorales L, MartinHidalgo Y, Diez de la LastraBosch I. Level of evidence in hand surgery. BMC Res Notes 2012; 5: 665
 6 Levine DW, Simmons BP, Koris MJ. , et al. A selfadministered questionnaire for the assessment of severity of symptoms and functional status in carpal tunnel syndrome. J Bone Joint Surg Am 1993; 75 (11) 15851592
 7 Atroshi I, Lyrén PE, Gummesson C. The 6item CTS symptoms scale: a brief outcomes measure for carpal tunnel syndrome. Qual Life Res 2009; 18 (03) 347358
 8 Atroshi I, Lyrén PE, Ornstein E, Gummesson C. The sixitem CTS symptoms scale and palmar pain scale in carpal tunnel syndrome. J Hand Surg Am 2011; 36 (05) 788794
 9 Rosales RS, Atroshi I. Spanish versions of the 6item carpal tunnel syndrome symptoms scale (CTS6) and palmar pain scale. J Hand Surg Eur Vol 2013; 38 (05) 550551
 10 Rosales RS, Delgado EB, Díez de la LastraBosch I. Evaluation of the Spanish version of the DASH and carpal tunnel syndrome healthrelated qualityoflife instruments: crosscultural adaptation process and reliability. J Hand Surg Am 2002; 27 (02) 334343
 11 Rosales RS, MartinHidalgo Y, RebosoMorales L, Atroshi I. Reliability and construct validity of the Spanish version of the 6item CTS symptoms scale for outcomes assessment in carpal tunnel syndrome. BMC Musculoskelet Disord 2016; 17: 115 Doi: 10.1186/s1289101609635
 12 Atroshi I, Gummesson C, Johnsson R, McCabe SJ, Ornstein E. Severe carpal tunnel syndrome potentially needing surgical treatment in a general population. J Hand Surg Am 2003; 28 (04) 639644
 13 Rosales RS, Diez de la Lastra I, McCabe S, Ortega Martinez JI, Hidalgo YM. The relative responsiveness and construct validity of the Spanish version of the DASH instrument for outcomes assessment in open carpal tunnel release. J Hand Surg Eur Vol 2009; 34 (01) 7275
 14 Doménech JM, Navarro JB. Regresión Lineal Múltiple con predictores categóricos y cuantitativos. 11ª ed. Barcelona: Signo; 2018
 15 Rubin DB, Schenker N. Multiple imputation in healthcare databases: an overview and some applications. Stat Med 1991; 10 (04) 585598
 16 Fisher RA. On the mathematical foundations of theorical statistics. Philos Trans R Soc Lond A. 1992; 222A: 309368
 17 Fisher RA. Statistical Methods for Research Workers. 5th ed. Edinburgh: Tweeddale Court London. 33 Paternoster Row; 1934
 18 Doménech JM. Fundamentos de de Diseño y Estadistica. UD7. Comprobación de hipótesis: Pruebas de significación, pruebas de hipótesis y tamaño de los grupos. 18ª ed. Barcelona: Signo; 2017
 19 Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purpose statistical inference, Part I. Biometrika 1928; 20A: 175240
 20 Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purpose statistical inference, Part II. Biometrika 1928; 20A: 263294
 21 Neyman J, Pearson ES. On the problem of the most efficient test of statistical hypotheses. Philos Trans R Soc Lond A. 1933; 231: 289337
 22 Schwartz D. Métodos estadísticos para médicos y biólogos. Barcelona: Herde; 1985
 23 Neyman J. On two different aspects of the representative method; the method of stratified sampling and the method of purposive selection. J R Stat Soc 1934; 97 (04) 558625
 24 Ozyürekoğlu T, McCabe SJ, Goldsmith LJ, LaJoie AS. The minimal clinically important difference of the Carpal Tunnel Syndrome Symptom Severity Scale. J Hand Surg Am 2006; 31 (05) 733738 , discussion 739–740
 25 Doménech JM. Fundamentos de de Diseño y Estadística. UD9.Comparación de dos proporciones. Medidas de asociación y de efecto. 18ª ed. Barcelona: Signo; 2017
 26 Doménech JM. Análisis estadístico de un estudio sobre Tabaco y Carboxihemoglobina. 4 ed. Barcelona: Signo; 2017
 27 Kleinbaum DG, Kupper LL, Muller KE, Nizam A. Applied Regression Analysis and other Multivariate Methods. 4th ed. Pacific Grove (CA): Duxburry Press; 2008
 28 McCullagh P, Nelder JA. Generalized linear model. 2nd ed. London: Chapman and Hall; 1989