| |
Adequacy of Sample
Sizes
In general, the key
values that go into determining sample size are:
The
expected effect size of the treatment (mean X - mean Y, the
difference between observed means in the treated and control
group, respectively);
The variability of the outcome measure in the populations
The type I error rate, or alpha, defined as the probability
that the trial will declare two equally effective treatments
"significantly" different from each other when they
are not
The
type II error rate, or beta, which is the probability of failing
to reject the null hypothesis when the difference between responses
in the two groups is a given effect size
The number of observations or participants in each group ('n')
(Wittes,
2002)
A generic sample size
formula is listed below, where Z is the test statistic, with a
normal distribution:
Z = (mean X - mean
Y)/ square root (2/n)
As the treatment effect
size becomes smaller, the number of needed subjects increases;
as the variability of the sample increases, so does the required
number of subjects.
The prominent epidemiologist
Richard Peto maintains that almost all ongoing clinical trials
are too small, even when considering multi-site trials of about
1000 subjects in each group, since trials smaller than this can
only reliably find moderate to large treatment effects (Peto
et al., 1995). A survey of published TMD clinical trials shows
that this is especially true for this condition.
- A recent systematic
review of RCTs of occlusal treatments showed that most studies
were small (Forssell
et al., 1999b).
- A systematic review
of TMD splint studies that we have performed showed that only
two studies of 23 (List
et al., 1992; Truelove et
al., 1999) had more than 100 total subjects, while over
half had 50 or fewer total subjects.
With such small numbers
of subjects, the opportunity to make a type I error (i.e. find
a false difference in treatment groups when none actually exists),
or, in related fashion, to find larger than real treatment effects
is a distinct possibility. In addition, approximate calculation
of the statistical power for these TMD splint studies showed that
almost all have values that are below 20%; that is, only 1 in
5 can reliably detect a difference in treatment, if a difference
truly was present (type II error).
Small sample sizes
may be appropriate for treatments with moderate (25% to 50%) to
large (50% +) effect size, but such treatments have not been convincingly
shown for TMD or chronic musculoskeletal pain in the largest,
more rigorous trials. These larger studies or ones with higher
quality scores have commonly showed reductions in the range of
5 to 30% in self-reported pain or other comparable outcomes compared
to the control group with treatment (List
et al., 1992; Wright
et al., 1995; Dworkin
et al., 2002a; Truelove et
al., 1999). Small sample sizes may also be used with conditions
that have small variability in outcomes, which, for a complex
self-reported human condition like TMD pain, usually does not
apply.
In agreement with recent
research, the smallest TMD RCT's evaluating splints with the lowest
quality scores show some of the largest effect sizes, with pain
reductions of up to 80% when compared to controls (Linde
et al., 1995; Lundh
et al., 1985). However, these same interventions, when performed
in larger, longer, and higher quality trials, with appropriate
controls groups, show much smaller reductions in pain (Truelove
et al., 1999; Dao
et al., 1994).
|