Skip to Content
Interactive Textbook on Clinical Symptom Research Logo


Home Button

Statistical Models for Prognostication
Author Bio
Introduction
Predictions: Statistical Models
Insight: Statistical Models
Ingredients: Statistical Models
Theoretical Aspects
Central Concepts
Currently selected section: Regression Models
Problems: Regression
Practical Advice
Example 1
Example 2
Chapter 8: Statistical Models for Prognostication: Development of Regression Models
        

Preliminary Steps

Suppose that we have individual patient data available in a data set with information on a number of potential predictors and on the outcome of interest. Before the actual modeling starts the following, preliminary, data analysis steps need to be taken:

  • Construct frequency tables. The distributions of covariables and of the outcome give an impression of the data under study. Covariables with a narrow distribution (a small range of observed values) may be discarded from the analysis. Cross-tables between covariables and the outcome are not made yet, since observations from cross-tables relate to the selection of predictors for the model (for more, see Development of Regression Models:Selection of Covariables).

  • Study missing values. Missing values in one or more predictors are a common problem. Several methods have been described to handle missing values. These vary from the omission of patients with missing values from the analysis, to simple imputation methods (e.g. filling in the mean value, or the predicted value based on correlations with other predictors), to multiple imputation methods (where multiple copies of the data set are made, each imputed with different predicted values). When an important predictor has many missing values, it may be sensible to discard it for the analysis, but hard criteria for when a variable has too many missing values are not available.

  • Decide on predictive model type. We focus here on regression models, while certain problems may be better handled with classification techniques or neural networks. Regression models have as advantages that the result can be attractively presented on paper (as opposed to neural networks, where a computer is necessary), that insight is obtained on the relative weight of predictors, and that the technique is widely available in statistical software packages.

QUESTION 7.1

Previous Page