Neuropediatrics
DOI: 10.1055/a-2731-5088
Letter to the Editor

One Size Doesn't Fit All: Four Score in the Pediatric ICU

Authors

  • Prateek Kumar Panda

    1   Pediatric Neurology Division, Department of Pediatrics, All India Institute of Medical Sciences, Rishikesh, Uttarakhand, India
  • Indar Kumar Sharawat

    1   Pediatric Neurology Division, Department of Pediatrics, All India Institute of Medical Sciences, Rishikesh, Uttarakhand, India

Funding Information None declared.

To the Editor.

I read the letter to the editor titled “One Size Doesn't Fit All: Four Score in the Pediatric ICU” on my article published in Neuropediatrics titled “Comparison of FOUR Score and GCS Score for Prediction of Outcome in Children with Impaired Consciousness.”

I want to clarify the concerns raised by the Dr. Indar Kumar Sharawat, DM, Additional Professor and Chief, Pediatric Neurology Division, Department of Pediatrics, All India Institute of Medical Sciences, Rishikesh, Uttarakhand, India-249203.

Query: A primary concern lies in the application of the FOUR score across a broad pediatric age spectrum without prior validation or adaptation for younger children. The FOUR score was initially designed for adult populations and includes parameters such as brainstem reflexes and respiratory patterns that are age-dependent and less reliable in infants and toddlers. While the GCS has well-established pediatric modifications, no such validated adaptation of the FOUR score exists for children under 5 years of age. The authors included children aged 1 to 14 years but did not perform stratified analysis or discuss age-related validity, which raises concerns about content and construct validity. Previous studies have acknowledged the limited applicability of the FOUR score in preverbal children without age-specific calibration.

Reply: We included children between 2 and 18 years of age. He has mentioned the wrong age group. Most previous studies also had similar age group, and assessed prognostic value of FOUR score in children aged ≥2 years.[1] [2] [3] [4] [5]

Query: Furthermore, inter-rater reliability, which is critical in observational studies involving subjective scoring tools, was not assessed in this work. The GCS and FOUR scores are known to have variable inter-observer agreement, particularly when performed by different cadres of clinicians (e.g., nurses, residents, intensivists). Czaikowski et al[3] demonstrated that inter-rater agreement for the FOUR score in pediatric patients can be excellent when scorers are adequately trained (weighted κ ≈ 0.95), but significantly lower when untrained or inconsistent scorers are involved. The current study notes that senior residents performed the assessments but does not mention any standardization or agreement testing, introducing the possibility of measurement bias that could affect the results.

Reply: No need to assess inter-rater reliability because both scores were measured by principal investigator (junior resident) only in our study.

Query: We are also concerned by the use of the unmodified adult FOUR score in preverbal children, particularly as this group constitutes a significant portion of the sample. In this age group, certain components such as eye tracking or specific respiratory patterns may not be assessable or may not correlate with underlying brain injury severity in the same way they do in older children. A pediatric version or at minimum, a justification for unaltered application of the adult version, would have improved confidence in the score's relevance. Prior literature has suggested the development of pediatric-adapted tools like the Pediatric FOUR (P-FOUR) score, which could have been a better choice than the original FOUR score for this study.[4]

Reply: Pediatric FOUR (P-FOUR) score has not been developed yet. Preverbal refers to the stage in early human development before speech or articulate language emerges. This term is often used to describe infants who have not yet acquired spoken language but are capable of other forms of communication such as gestures, vocalizations, and facial expressions. All previous studies used unmodified FOUR score in children.[1] [2] [3] [4] [5]

Reference cited for P-FOUR score is wrong. Büyükcam et al[4] also used same score in children aged 2 to 17 years for prediction of outcome in children with head trauma.

Query: We also note that mortality was the sole outcome assessed, while functional outcomes among survivors were not explored. In pediatric critical care, survival alone is insufficient—recovery with meaningful neurologic function is a primary objective. Validated scales such as the Pediatric Cerebral Performance Category or Functional Status Scale are frequently used to evaluate neurologic outcome in survivors. Previous work by Pollack et al[5] has demonstrated the relevance of such scales in PICU populations and emphasized their importance over crude binary outcomes like death or survival. Incorporating functional status would have provided a richer understanding of the scores' prognostic power.

Reply: In our study, we used the MRS scale for neurologic functions recovery. We have mentioned the functional outcomes in survivor group. Both the scales (Pediatric Cerebral Performance Category or Functional Status Scale and Modified Rankin Scale) are frequently used to evaluate neurologic outcome in survivors.

Query: We also observed that the study lacks a priori sample size or power calculation, despite comparing the predictive accuracy of two tools via ROC analysis. With only 78 children enrolled and a small number of outcome events, the study may be underpowered to detect significant differences in AUC. This is especially concerning because the reported AUCs had overlapping confidence intervals, and no statistical comparison (e.g., DeLong's test) was provided to validate any superiority claims. Without such analysis, conclusions about the relative performance of FOUR versus GCS remain speculative.

Reply: Sample size was calculated using following formula:

  • {ɳ = Z2α⅟2P(100-P)}/(E)2

  • Z = Standard Normal Variant, P = prevalence rate

  • E = Allowable error, n = required minimal sample

  • Here, z_(α/2) = 1.96 at 95% confidence interval

  • P = 5.34% E = 5%

  • Hence, n = 77.6 = 78

  • Therefore, the sample size of our study will be 78.

Query: Finally, score assessments were not standardized in terms of timing relative to sedation or resuscitation. The use of sedatives, paralytics, or recent seizure activity can transiently alter consciousness levels, potentially leading to score misclassification. Standard protocols—such as assessing after a sedation hold or at a uniform time post-resuscitation—would enhance comparability and reliability. Previous studies have highlighted how even minor timing discrepancies can lead to significant score variation, particularly in pediatric populations.[3] [5]

Reply: The study excluded children having traumatic brain injury (TBI), spinal cord injury, intellectual, motor, visual, or hearing impairment, and episodes of seizure in the preceding hour. Patients on neuromuscular function blockers or heavily sedated were also excluded in this study.



Publication History

Received: 02 August 2025

Accepted: 17 October 2025

Article published online:
10 November 2025

© 2025. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany