Continuous
covariables are sometimes treated as categorical variables
by applying cut-off values. For example, age as a predictor
of mortality may be dichotomized at 65 years. Cut-off values
are sometimes chosen after a search for an optimum by some statistical
criterion. This results in a bias in the estimated regression
coefficient for the categorized predictor: it will be overestimated.
This is explained by the fact that the largest coefficient is
selected from the set of all possible coefficients corresponding
to each cut-off. The p-value for the categorized predictor can
be adjusted by advanced statistical procedures (Mazumdar
and Glassman, 2000). We prefer the use of (transformed)
continuous covariables over categorized covariables, since information
is lost by categorization; see for example the relationship
between age and mortality in the following graph.
Figure
5.1: Age and 30-day Mortality Relationship
|
|---|
|
Illustration
of the relationship between age and 30-day mortality
after acute myocardial infarction. Data from the GUSTO-I trial
(Lee
et al., 1995) were analyzed with age as a linear, continuous variable
(thick line) and with a dichotomized version of age (<65 years versus >65
years). With the dichotomized version of age, there is an unnaturally
big step between age 64 and age 65, and no difference in predicted risk
among patients younger than 64 and among those older than 65 years of age. |
|