Hypergeometric, Binomial, and Normal Probability

Hypergeometric Distribution

In a Hypergeometric Distribution, we are selecting objects from a population of size N. M of these items in the population are considered "successes". We are selecting a sample of size n. This selection is performed without replacement, i.e. select an item, don't replace it, select the next item, and so on. Notice that p =M/N is the probability of selecting a success on the first selection. However, the probability of selecting a success on subsequent selections changes depending on what was selected in previous selections. p is also the proportion of successes in the population. The random variable X is then the number of successes in the population. Notice that selecting without replacement from a finite population is a very common scenario. When we take a sample, we rarely allow the same item to be selected more than once. This is one of the most realistic probability situations. In the app select Hypergeometric and adjust the parameters N, M, and n via the sliders or input boxes. Adjust the values of L and U and select Left Tail, Right Tail, Middle, or Two-tailed for the type of probability. The appropriate area will be shaded and the probability is computed. The probability is the sum of the heights of the shaded bars. Since their widths are all 1, the probability is also the sum of the areas of the shaded bars. Notice that there are n+1 bars one above every whole number from 0 to the sample size n, even if some of the bars are too short to see on this scale. Check the Parameters box to see the values of some of the calculated parameters for this particular Hypergeometric Distribution. The formula for the Probability Density Function for the Hypergeometric Distribution is:

Binomial Distribution Approximation

The set up for a Binomial Distribution is exactly the same as for a Hypergeometric Distribution, except that the selections are performed with replacement, i.e. each time we select an item we replace it before selecting the next item. So, some items might be selected more than once. In this case the population proportion of successes, p, is the probability of a success on every selection. Each selection is an independent event. The formula for the PDF for a Binomial Distribution is , where q = 1-p is the probability of a failure each time. Sampling without replacement is much more realistic, so a Hypergeometric Distribution should usually be used. However, in order to use the Hypergeometric Distribution, we must know the population size. This is sometimes problematic. In cases where the population size is unknown, but we know it is very large, we can use a Binomial Distribution to approximate a Hypergeometric Distribution. This is often done. In the app, select Binomial Approximation to show the Binomial Distribution. Select both the Hypergeometric Distribution and Binomial Approximation, to see how the Binomial Approximation compares to the corresponding Hypergeometric Distribution. The value of the probability is computed both ways and the error (Binomial - Hypergeometric) is computed. Notice that as the population size gets larger the two distributions get closer and closer together and the error gets smaller and smaller, approaching 0 as the population size approaches infinity. If you want to just start with a Binomial Distribution, see it, and compare it to a Normal Approximation, then convert the given value of p to fractional form. Enter the numerator as M and the denominator as N.

Normal Approximation

We can approximate both Hypergeometric and Binomial Distributions by Normal Distributions. Check Normal Approximations to see the Normal Approximations. A Normal Probability Density Function with the same mean and standard deviation is graphed. Notice that the mean of both the Hypergeometric and Binomial Distributions are the same (np). This is the mean of all Normal Approximations. The most commonly used standard deviation is to use the standard deviation of the Binomial Approximation. Instead, we can use the more accurate standard deviation from the Hypergeometric Distribution. Check on Parameters to see these values. You can select either standard deviation using the checkboxes. Even for fairly large population sizes, there are significant differences in the Normal Approximation and the actual Hypergeometric Distribution. See the size of the errors for the displayed probabilities. One thing that can correct somewhat for these problems is to use a Discrete Correction. Check that box to see what this is. If the population size is very large, then the Normal Approximation and the Binomial Approximation are close to the actual probability from the Hypergeometric Distribution. As the population size increases, these approximations get better and better, and the errors approach 0. However, it is the author's opinion that the standard rules found in many texts of Np and Nq both being greater than 15 are much too small to adequately use the Normal Approximation. Experiment with various values to see what might be a better criteria. In fact, since we have this application and similar functions that are built into graphing calculators, spreadsheets, and even cell phones, one should never use a Normal Approximation. Hypergeometric and Binomial calculations are as easy to perform as normal approximations.