|
|
 |
| |
Internal
validity can be studied with a variety of techniques, as described
below.
Table
7.1: Techniques for Studying Internal Validity
|
|---|
Split-sample
| A
straightforward and fairly popular approach is to
randomly split the training data in two parts: one
to develop the model, and another to measure its performance.
With the split- sample approach, model performance
is determined on similar, but independent, data. Common
splits are 50:50 or 2/3:1/3. |
|---|
Cross-validation
| A
more sophisticated approach is to use cross validation,
which can be seen as an extension of the split-sample
method. With split-half cross-validation, the model
is developed on one randomly drawn half and tested on
the other, and vice versa. The average is taken as an
estimate of performance. Other fractions of subjects may be
left out, e.g. 10% to test a model developed on 90% of the sample.
This procedure is repeated 10 times such that all
subjects have once been used to test the model. To improve
the stability of the cross-validation, the whole procedure
can be repeated several times, taking new random sub-samples.
The most extreme cross-validation procedure is to
leave one subject out at a time, which is equivalent to
the jack- knife technique (Efron
and Tibshirani, 1993) (Efron and Tibshirani,
1997). |
|---|
Bootstrap
validation
| The
most efficient validation is achieved by the bootstrap
(Efron and Tibshirani,
1997). Bootstrapping replicates the process of sample generation from an underlying population by drawing samples with replacement from the original
data set, of the same size as the original data set. Models
may be developed in bootstrap samples and tested in the
original sample to replicate validation in new subjects. |
|
Techniques for
Validation
In practice,
bootstrap validation may be hampered because not all modeling
decisions can be performed in an automatic way. For example,
a regrouping of categorical variables may be done on subjective
grounds, inspired by the findings in the data. When not all
modeling decisions can be systematically replayed, the split-sample
approach may be considered, provided that a large validation
sample can be kept out. Alternatively, one might ignore these
decisions and calculate a bootstrap estimate as an upper limit
of expected performance.
QUESTION
7.8
A characteristic
of internal validation is that the parts of the data that are
kept out of the model development phase (to test models) are
kept out:
|
|