|
|
 |
| |
-
Aim
for a collaborative analysis of data.
When data are available from several centers or research groups,
the sample size increases, and hence addresses the major problem
of predictive modeling. Also, the generalizability of the
resulting model may be larger when patients are included from
a broader spectrum.
-
Be
clear on the aim of the modeling.
Often the aim of obtaining insight in a prognostic problem
is confused with obtaining adequate predictions. Obtaining
insight is related to hypothesis testing, and procedures such
as stepwise selection may have some role here to identify
important predictors. In contrast, prediction is an estimation
problem rather than a testing problem. Best predictions may
often be obtained with shrunk, pre-specified models, and incorporation
of external knowledge (Harrell
et al., 1996) (Steyerberg
et al., 2000a).
-
Choose
relevant outcomes. A predictive model should consider
an outcome that is clinically relevant.
-
Code
predictors without looking at the outcome and with external
information. Re-grouping of categorical predictors may
be based on frequency tables, and clinically related covariables
may be combined in summary predictors (Harrell
et al., 1985).
-
Use
continuous functions for continuous variables. In addition
to a linear term one may add smooth nonlinear functions such
as restricted cubic splines.
-
Select
predictors with external information. A careful consideration
of the set of candidate covariables for a predictive model
is of paramount importance, and may be guided by external
information, such as findings in other studies and clinical
knowledge.
-
If
stepwise selection is applied, apply the backwards (or "step-down")
variant with a high p-value (e.g. 0.50).
-
Estimate
the required shrinkage for the regression coefficients.
When the data set is large, no shrinkage will be required,
in contrast to a small data set, where a substantial shrinkage
may be needed.
-
Aim
for a readily applicable presentation. Selection of a
limited number of predictors may be harmful for predictive
purposes, especially when based on stepwise selection methods.
A more complex model, containing more predictors, can be presented
nicely with a score chart, a nomogram, or a table.
-
Assess
internal validity.
The bootstrap should preferably be used to replay all modeling
choices (Efron
and Tibshirani, 1993). If a split-sample approach
is chosen, the final model should be estimated on the full
sample.
-
Assess
external validity, if possible. External validity can
be assessed in a non-random part of the data set under study,
e.g. different in time or place, or in a new data set, which
is different in time or place.
|
|