Skip to Content
Interactive Textbook on Clinical Symptom Research Logo


Home Button

Secondary Analysis of Large Survey Database
Author Bio
Why Conduct Secondary Anaylsis
Advantages of Survey Data
Avoiding the Pitfalls
Start with the Research Question
Determine Variables of Interest
Identify and Evaluate the Data Source
Get the Data
Survey Design
Sampling Frame
Telephone Surveys
Followback Surveys
Multistage Cluster Samples
What is a Panel Design
Mode of Survey Administration
Survey Instruments
CodeBooks
Online Exploratory Analysis
Potential Sources of Error
Cultural Nonequivalence
Analysis of Survey Data
Currently Selected Section: Cluster and Stratified Samples
Using Sample Weights
Missing Data
Power Calculations
Linking Data Sources
Multiple Comparisons
Getting Help
Giving Feedback
Conclusion
Chapter 20: Secondary Analysis of Large Survey Database: Cluster and Stratified Samples
          

Another factor that affects estimates of variance are the complex sample designs used in large nationally representative data. For example, the MEPS sample design includes stratification, clustering, multiple stages of selection, and disproportionate sampling. Surveys such as MEPS, MCBS, and the National Ambulatory Medical Care Survey (NAMCS) collect data on subjects who are members of a geographic cluster of sample candidates (counties, physicians' practices). Since respondents are drawn from geographic clusters, individuals within a given cluster may have more in common with each other than individuals drawn at random from the general population, thus increasing the variance of the survey. If the researcher does not account for the sampling method in the analysis, the standard errors and confidence intervals will be underestimated, which can lead to incorrect conclusions from significance testing (Korn and Graubard, 1999).

In commonly used software packages, the standard assumption is that the survey population was drawn from a simple random sample. To analyze complex survey data you need specific statistical software such as SUDAAN, Stata, WesVar, or SAS survey functions that include special analyses devoted to sample survey data collected with stratified and cluster sampling methods. These analyses incorporate primary sampling unit (PSU) and strata indicator variables that are put into the databases by the database designer to estimate variance in descriptive and multivariable analyses. Variance estimates may vary somewhat depending on the method used by the statistical package. This table shows standard deviations of the mean for two variables in the MCBS using SUDAAN, Stata, and non survey functions of SAS. The standard deviation is larger when using the correct methodology.

MCBS Standard Deviations of the Mean for Inpatient and HHA Payments Using Commonly Used Software Packages
Payment Type Mean Standard Deviations of Mean
  
SUDAAN
Stata
SAS
Inpatient
$2,103
94.1
92.7
72.6
Home Health
$431
30.7
30.8
23.5

Page 22 of 30
      Previous Section