Skip to Content
Interactive Textbook on Clinical Symptom Research Logo


Home Button

Secondary Analysis of Large Survey Database
Author Bio
Why Conduct Secondary Anaylsis
Advantages of Survey Data
Currently Selected Section: Avoiding the Pitfalls
Start with the Research Question
Determine Variables of Interest
Identify and Evaluate the Data Source
Get the Data
Survey Design
Sampling Frame
Telephone Surveys
Followback Surveys
Multistage Cluster Samples
What is a Panel Design
Mode of Survey Administration
Survey Instruments
CodeBooks
Online Exploratory Analysis
Potential Sources of Error
Cultural Nonequivalence
Analysis of Survey Data
Cluster and Stratified Samples
Using Sample Weights
Missing Data
Power Calculations
Linking Data Sources
Multiple Comparisons
Getting Help
Giving Feedback
Conclusion


Chapter 20: Secondary Analysis of Large Survey Database: Avoiding the Pitfalls
        

Researchers designing studies using secondary survey data must recognize any important limitations at the outset so that these limitations may be addressed and their potential impact on findings considered. Following are a number of the possibilities:

  • Secondary data analyses may use data for a purpose other than that for which the original data collection was designed. Survey designers must trade off comprehensiveness of items in a given area with respondent burden and costs. Therefore, specific items or factors of interest may have not been assessed, may have been collected in a different manner, or collected with less depth than the investigator would prefer.

  • Although timeliness is an advantage of secondary analysis, there is a variable lag period between data collection and data availability. This is an issue in situations where there are rapid changes in areas of study interest (e.g. clinical practice and health care organization and delivery).

  • Although obtaining some secondary survey data may be as easy as downloading a file from the web, other data sets require specific data use agreements. To protect respondent confidentiality, some data can only be accessed at special data centers. (Click here for more information on this topic.)

  • Some investigators may be tempted to find an interesting data source and then explore it for associations of interest (data-dredging). However, findings using this strategy are problematic, since spurious associations related to large sample sizes and large number of variables are commonplace, as will be explained on the next page.

  • Although surveys often allow analyses for specific population subgroups, there may be insufficient sample size to study a particular group or condition of interest (e.g. Native Americans, the "oldest old", individuals with rheumatoid arthritis).

  • Nonresponse to the survey itself or individual items may introduce bias. (Click here for more information on this topic.)

  • Although longitudinal data sets can support development of predictive models, creating the analytic files to support these analyses is challenging in most surveys, and limitations such as sample attrition are common.

  • Investigations using survey data are subject to all of the inherent limitations of observational studies. However, observational studies may be the only feasible way to answer the study question, and statistical methods are available to account for and minimize potential bias in these analyses.

  • Differences in survey methods such as sampling frame, item wording, and timing of data collection may result in different estimates for a similar question derived from different data sources. Therefore, the researcher must pay attention to the specifics of survey methodology and understand how this may influence results.

  • Page 4 of 30
          Previous Section