|
The research
terms "reliability" and "validity" have connotations
that depend on the context of the discussion. Both terms are used
to judge the quality of multiple elements of a research project.
For example, we can attempt to assess either the reliability or
validity of a specific dependent variable. Sometimes investigators
refer to the results of an entire investigation as "reliable;"
in this context the implication is that the results were "statistically
significant." We can also attempt to assess either the internal
or external validity of a randomized clinical trial.
Most
references to reliability turn on the need to demonstrate the
"repeatability" or "consistency" of a particular
measure, especially one designed to capture variations in subjective
state such as health status or quality of life. Conceptually,
reliability increases as measurement error decreases. Various
ways to quantitatively document the repeatability of a measure
include inter-rater agreement, test-retest reliability, and internal
consistency. Discussions of these procedures are commonplace (e.g.
http://trochim.cornell.edu/KB/reltypes.htm).
Investigators should appreciate not only these procedures but
also the routine and appropriate use of statistical tools used
to estimate the degree to which measurement error is minimized,
such as the correlation coefficient, coefficient alpha, or Kappa
(e.g. http://www.geolog.com/msmnt/mrelobj-old.htm).
A measure
can be reliable but not valid; but to be valid, a measure must
be reliable. The validity of a measure refers to the degree to
which it measures what it purports to measure. Unlike the relative
ease with which reliability can be established, documenting validity
is more difficult, both practically and theoretically. The validity
of a measure is established through various levels of both conceptual
and computational analyses. At the most immediate level, "face"
validity is demonstrated if the questions or scale items seem
to relate directly to the purpose of the measure: assessment of
eye color is not likely to contribute to the valid assessment
of dyspnea.
"Content"
validity refers to how representative or adequate the items on
the measure or instrument are as they relate to the entire domain
or possible universe of the phenomena being measured. Content
validity is established through the individual analysis and rating
of item appropriateness by a panel of at least three to five experts.
Criterion validity refers to the degree to which scores on a measure
can predict current (concurrent validity) or future (predictive
validity) performance or outcomes. For example, one validates
a written test of driving by showing that it accurately predicts
how well someone actually operates an automobile. Scores obtained
on the Scholastic Aptitude Test are valid by showing that they
predict some 'criterion' such as success in college. Construct
validity is the most theoretical and difficult level of analysis,
and refers to the degree to which the construct as described is
a valid conceptualization of the phenomena. The determination
of construct validity usually requires a number of studies conducted
over a period of time.
|