Skip to Content
Interactive Textbook on Clinical Symptom Research Logo


Home Button

Secondary Analysis of Large Survey Database
Author Bio
Why Conduct Secondary Anaylsis
Advantages of Survey Data
Avoiding the Pitfalls
Start with the Research Question
Determine Variables of Interest
Identify and Evaluate the Data Source
Get the Data
Survey Design
Sampling Frame
Telephone Surveys
Followback Surveys
Multistage Cluster Samples
What is a Panel Design
Mode of Survey Administration
Survey Instruments
CodeBooks
Online Exploratory Analysis
Potential Sources of Error
Cultural Nonequivalence
Analysis of Survey Data
Cluster and Stratified Samples
Using Sample Weights
Missing Data
Currently Selected Section: Power Calculations
Linking Data Sources
Multiple Comparisons
Getting Help
Giving Feedback
Conclusion


Chapter 20: Secondary Analysis of Large Survey Database: Power Calculations
        

As in any study it is important to make sure that there is sufficient data to test hypotheses or show that relevant statistics did not occur simply by chance.

It is important to assess the power of the survey to: 1) make reliable estimates about particular subgroups of interest, and 2) test the study hypotheses. For example, MEPS has a large sample size. However, if the investigator is interested in assessing the prevalence of functional impairments in African American males age 65 and older, the survey may not contain an adequate sample size.

By making use of survey codebooks and online exploratory analyses the investigator can assess the adequacy of sample size for the study purposes before proceeding further. If the sample size is inadequate, in other words if the study lacks power, the investigator risks a Type II error: No difference will be found and the null hypothesis will be accepted when it is false. Power calculations can be done with statistical software applications.

Click on the examples below to see some of the ways in which sample size and power can be estimated using the Stata software. Similar computations can be achieved by using sample size formulas given in Fleiss, Pagano and Gauvreau, and others. Our intent is to demonstrate that while calculations may be very precise, their accuracy is often unknown because they depend on assumptions that may or may not be realistic. The following examples of Stata commands and output show how sensitive are the initial estimates (or guess) of the population variance, the key component in the sample size estimator. Stata's algorithm uses p*(1-p) (the proportion of events and non-events) to estimate variance. Note that the value of the variance estimator depends upon the magnitude of p, not the difference between the observed proportion and its hypothesized value.

Adequate sample size is needed to minimize the risk of incorrectly rejecting or accepting a null hypothesis. Type I error occurs when the null hypothesis is rejected when it is in fact true. Alpha is the probability of committing a Type I error. The investigator chooses the alpha based on a decision of how important it is to avoid retaining the null hypothesis when it actually is false. Commonly, an Alpha of 0.05 is selected. Type II error occurs when the null hypothesis is accepted when it is false. Beta is the probability of committing a Type II error, and is typically established after Alpha has been selected. Commonly, a Beta of 0.20 is selected. The value 1-Beta is called the Power of a test. It represents the chance of detecting a difference if it actually exists. If the size of the sample is selected so that the power of the test is (.80 = 1.0 - Beta) then there is an 80% chance of detecting the hypothesized difference if it actually exists (Fleiss, 1981).

Once the investigator specifies the magnitude of the difference he or she would like to detect and selects values of Alpha and Beta that are acceptable, then the required sample size can be estimated. In analyzing secondary survey data the size of the available sample is known. Then the investigator only needs to calculate whether the sample has sufficient power to detect a statistical difference of interest (see Example C).

Sample size estimation is always an approximation because it depends on values that are unknown. Investigators make educated guesses as to their values, plug them into a formula and compute an estimate of the desired sample size. Various methods of sample size estimation differ in their formulas, initial guesses about standard deviations, assumptions about normality, and other correction factors. It should not be surprising that each will give a different answer. The more complex formulas tend to give similar results because common assumptions are specified explicitly in the complex terms of the formulas.


Page 25 of 30
      Previous Section