Inference for a Proportion

Standard Error of a Proportion

The standard error of a proportion is a statistic indicating how greatly a particular sample proportion is likely to differ from the proportion in the population proportion, p. Let p^ represent a proportion observed in a sample. (The "^" symbol is called a hat. This indicates that the proportion is based on sampled data, much like "x bar" denotes a sample mean. Righfully, the "^" should be directly over the p, but this is difficulty to achieve within HTML constraints.) Sample proportion p^ = X / n, where X represents the observed number of people in the sample with the characteristic in question. Assuming sampling independence, X is a binomial random with probability of success p.

If we assume that the normal approximation to the binomial holds, the number of positive outcomes (X) in a sample is normally distributed with mean µ = np and variance s² = npq (where q = 1 - p). Concurrently, when the normal approximation holds, sample proportion p^ is normally distributed with mean p and variance pq/n. The normal approximation to the binomial can be justified on the basis of the central limit theorem, while p^ can be shown to be the mean of a sample of zeros and ones (i.e., X = 0 for "failures" and X = 1 for "success").

The normal approximation to the binomial can be shown to be very accurate when npq >= 5. The approximation is adequately accurate when npq >= 3. For example, if p = .5 and n = 10, then npq = (20)(.5)(.5) = 5, suggesting that the normal approximation can be applied with accuracy. Then, for large n, the standard error of the sample proportion (SEP) is given by:SEP = sqrt(pq/n)

where p represents the probability of success, q = 1 - p, and n represents the sample size. For example, if p = .5 and n = 20, then SEP = sqrt[(.5)(.5) / (20)] = 0.1118.

When population proportion cannot be assumed, we can calculate the estimated standard error of the proportion (sep) as:sep = sqrt (p^q^/n)

Since, sample proportions based in this population will be approximately normally distributed, we know that about 95% of such estimates will be within ± (2)(SEP) of the population's proportion. This ±(2)(SEP) can be viewed as the proportion's margin of error (d). For example, when n = 20, X = 10, and p = .5, SEP = 0.1118 and margin or error d = (2)(.1118), or 0.22. We can now be reasonably assured that most sample proportions will be within ± .22 units of their underlying population proportion.

Application of the Central Limit Theorem and Standard Error of the Proportion

Suppose a political poll shows 55 out of 100 prospective voters in favor of candidate A. Candidate A then predicts imminent victory. However, a thoughtful statistician calculates SEP = sqrt [(.55)(.45) / 100] = 0.0497 and margin of error = (2)(.0497) = .0994. The proportion of voters favoring candidate A in the population of prospective voters will probably lie between .55 ± .0994, or between .45 and .65. This result, then, does not confirm a clear majority; it seems as if candidate A's predictions are imprudent.

In estimating a population proportion, margin of error d ~= (2)(SEP) = (2)sqrt(pq/n), son = 4pq/d²

where p represents the population proportion and q = 1 - p. For example, if p = .25 and we want d = ±0.05, then n = (4)(0.25(0.75)/(0.05)² = 300. As a general rule, this formula is accurate when p is not very close to 0 or 1 (say, .05 < p < .95) and when sample size n is small relative to population size N (so that the sampling fraction < .05).

If one does not have an assumed value for p but still wants to estimate a proportion with margin of error d, then assume p = .5 and the approximate sample size requirements of a study can be calculated with the formula n = 1/d². For example, to determine p with margin of error d no greater than .05, n = 1 / (.05)² = 400. This provides a "ball-park" estimate of your sample size requirement.