Alternatives
to Stepwise Selection
Given the
problems with stepwise selection methods, alternatives have
been considered (Harrell
et al., 1985). The most obvious selection strategy
is to fit a fixed selection of pre-defined predictors, in other
words, based on firm clinical knowledge and information from
other studies.
As an intermediate
option between fitting a full model and stepwise selection with
the standard significance
level (alpha=0.05), we may consider applying stepwise
selection with a high alpha for selection. We may, for example,
exclude covariables with p-values exceeding 0.50, arguing that
these probably contribute more noise than predictive information
to the model (Steyerberg
et al., 2000a).
We may also
apply Akaike's Information Criterion (AIC). Application of AIC
is equivalent to a p-value of 0.157 when covariables with 1
df are considered, and corresponds closely to application of
Mellow's cp for selection with all subsets regression. Note
that AIC was originally intended to compare pre-specified models
of different complexity (as indicated by the degrees of freedom
of the predictors). AIC was never intended specifically for
stepwise procedures, where many models are being compared sequentially.
So, instead of 0.157 one might also argue for a p-value of 0.20
or 0.50.
With higher
p-values, the stability of the selection increases and the power
for inclusion of true predictors increases. The increase in
power is associated with a risk of including noise variables,
which may however be less severe then omission of true predictors.
Further, a higher p-value for selecting variables reduces the
biases in the estimation of variances, p-values,
and regression coefficients (Steyerberg
et al., 1999).
QUESTION
7.4
In smaller
medical data sets, better predictions are generally obtained
by selecting variables:
Selection of Interaction Terms
The selection
of covariables concentrates usually on "main effects," which
means that the covariables are entered in the regression formula
without interaction terms. Interaction terms may be considered
as a check of the additivity assumption. A complicating factor
is, however, that the number of potential interaction terms
explodes when a substantial number of covariables is included
in the model. Some pre-selection is therefore desirable, e.g.
on clinical grounds (Harrell
et al., 1996).
In small
data sets, the statistical power will be too limited to allow
for a reasonable assessment of interaction terms. We may then
rely on a predictive model with main effects only, which implies
that average effects of predictor variables, over all other
covariables, are modeled (Brenner
and Blettner, 1997).