Skip to Content
Interactive Textbook on Clinical Symptom Research Logo


Home Button

Statistical Models for Prognostication
Author Bio
Introduction
Predictions: Statistical Models
Insight: Statistical Models
Ingredients: Statistical Models
Theoretical Aspects
Central Concepts
Currently selected section: Regression Models
Problems: Regression
Practical Advice
Example 1
Example 2
Chapter 8: Statistical Models for Prognostication: Development of Regression Models
        
Advanced Shrinkage Methods

For linear shrinkage with bootstrapping, the shrinkage factor can also be estimated with the bootstrap (Harrell et al., 1996) (Steyerberg et al., 2000a), as follows:

  1. Take a bootstrap sample: size n, drawn with replacement.
  2. Follow the selection and estimation strategy.
  3. Calculate the linear predictor in original sample with regression coefficients from the bootstrap sample.
  4. Calculate the slope of linear predictor in a regression model where the linear predictor is the only covariable. The slope indicates the agreement between predictions and observations.

Steps 1 to 4 should be repeated many times (100/200/...) to obtain a stable estimate of mean (slope in step 4). The slope will generally be less than one, reflecting the "overfitting" in the data under study. Empirically, the slope has been found to be similar in pre-specified models and stepwise selected models. Linear shrinkage may be referred to as "shrinkage after fitting", since the regression coefficients are first estimated, and subsequently shrunk.

Ridge regression was proposed in the 1970s to reduce variability between estimated regression coefficients (Hoerl and Kennard, 1970). For generalized linear models, penalized maximum likelihood estimation has been proposed more recently (Harrell et al., 1996). A penalty factor is taken into account that reduces the spread in predictions. The penalty factor may be estimated by an evaluation of the effective degrees of freedom in relation to the adequacy of the fit to the data in an AIC measure, for varying values of the penalty factor. Ridge regression may be referred to as "shrinkage during fitting", since the regression coefficients are shrunk during the fitting process.

The Lasso is a recently proposed method. It is a variant of ridge regression (Tibshirani, 1997), where the penalty works on the absolute values of regression coefficients of standardized predictors. This means that some coefficients are shrunk to zero. Since the covariables with zero coefficients can be omitted, selection of covariables is obtained. For this reason, the Lasso may be referred to as "selection through shrinkage."

 

Return to Current Section