Abstract:
Predictor variables for multivariate rules are frequently selected by methods that
maximize likelihood rather than information. We compared the discrimination and reproducibility
of a prediction rule for pneumonia derived using extended dependency analysis (EDA),
an information maximizing variable selection program, with that of a validated rule
derived using logistic regression. Discrimination was measured by receiver-operating
characteristic (ROC) analysis, and reproducibility by rederivation of the rule on
200 replicate samples of size 250 and 500, generated from a training cohort of 905
patients using Monte Carlo techniques.
Four of the five predictor variables selected by EDA were identical to those selected
by logistic regression. With each variable weighted by its conditional contribution
to total information transmission, EDA discriminated pneumonia and nonpneumonia in
the training cohort with an ROC area of 0.800 (vs 0.816 for logistic regression, p
= 0.60), and in the validation cohort with an area of 0.822 (vs 0.821 for logistic
regression, p = 0.98). EDA demonstrated reproducibility comparable to that of logistic
regression according to most criteria for replicability. Replicate EDA models showed
good discrimination in the training and testing cohorts, and met statistical criteria
for validation (no significant difference in ROC areas at a one-tailed alpha level
of 0.05) in 80.8% to 94.2% of cases.
We conclude that extended dependency analysis selected the most important variables
for predicting pneumonia, based on a validated logistic regression model. The information-theoretic
model showed good discriminatory power, and demonstrated reproducibility according
to clinically reasonable criteria. Information-theoretic variable selection by extended
dependency analysis appears to be a reasonable basis for developing clinical prediction
rules.
Key-Words
Clinical Prediction Rules - Information Theory - Extended Dependency Analysis - Discrimination
- Reproducibility