Bias-variance
Trade-off
Models with a low
bias describe the data under study well. Examples
include flexible models such as neural networks, which naturally
accommodate non-linear relationships; or regression models with
many interaction terms. These models may however have a high
variability, which means that they may not validate well in
new patients (Ennis
et al., 1998).
In contrast, simple
models with main effects may have a substantial bias, but may
be rather robust in prediction for new patients. An example
is the application of Bayes' rule in a rather naïve way, i.e.
without taking correlations between predictors into account
(Idiot's Bayes). This method is equivalent to the application
of univariable logistic regression coefficients for prediction.
Empirical results, especially discrimination,
were favorable in some case studies (Spiegelhalter,
1986).
Predictive modeling
may be seen as a balancing act of bias versus variance.
The sample size of the data set is of paramount importance.
Especially in small data sets, information from outside the
data under study, such as findings in other studies and clinical
knowledge, is important to guide model development.
QUESTION
8.4
Which model has the
lowest bias in describing the data under study?