Skip to Content
Interactive Textbook on Clinical Symptom Research Logo


Home Button

Statistical Models for Prognostication
Author Bio
Introduction
Predictions: Statistical Models
Insight: Statistical Models
Ingredients: Statistical Models
Theoretical Aspects
Central Concepts
Regression Models
Regression
Currently selected section: Problems: Practical Advice
Example 1
Example 2




Chapter 8: Statistical Models for Prognostication: Practical Advice for Regression Modeling
        

What To Do

  • Aim for a collaborative analysis of data. When data are available from several centers or research groups, the sample size increases, and hence addresses the major problem of predictive modeling. Also, the generalizability of the resulting model may be larger when patients are included from a broader spectrum.

  • Be clear on the aim of the modeling. Often the aim of obtaining insight in a prognostic problem is confused with obtaining adequate predictions. Obtaining insight is related to hypothesis testing, and procedures such as stepwise selection may have some role here to identify important predictors. In contrast, prediction is an estimation problem rather than a testing problem. Best predictions may often be obtained with shrunk, pre-specified models, and incorporation of external knowledge (Harrell et al., 1996) (Steyerberg et al., 2000a).

  • Choose relevant outcomes. A predictive model should consider an outcome that is clinically relevant.

  • Code predictors without looking at the outcome and with external information. Re-grouping of categorical predictors may be based on frequency tables, and clinically related covariables may be combined in summary predictors (Harrell et al., 1985).

  • Use continuous functions for continuous variables. In addition to a linear term one may add smooth nonlinear functions such as restricted cubic splines.

  • Select predictors with external information. A careful consideration of the set of candidate covariables for a predictive model is of paramount importance, and may be guided by external information, such as findings in other studies and clinical knowledge.

  • If stepwise selection is applied, apply the backwards (or "step-down") variant with a high p-value (e.g. 0.50).

  • Estimate the required shrinkage for the regression coefficients. When the data set is large, no shrinkage will be required, in contrast to a small data set, where a substantial shrinkage may be needed.

  • Aim for a readily applicable presentation. Selection of a limited number of predictors may be harmful for predictive purposes, especially when based on stepwise selection methods. A more complex model, containing more predictors, can be presented nicely with a score chart, a nomogram, or a table.

  • Assess internal validity. The bootstrap should preferably be used to replay all modeling choices (Efron and Tibshirani, 1993). If a split-sample approach is chosen, the final model should be estimated on the full sample.

  • Assess external validity, if possible. External validity can be assessed in a non-random part of the data set under study, e.g. different in time or place, or in a new data set, which is different in time or place.

Previous Page