Binomial and Hypergeometric Probability Trees
- Dr. Jack L. Jackson II
Selecting Colored Balls from an Urn
In this activity we are considering selecting a sample of n balls from an urn. We let the random variable X represent the number of Blue balls (successes) selected. Using either the sliders or the input boxes enter the total starting number of balls in the urn (Population Size = N), the number of blue balls in the urn at the beginning (number of successes in the population = M), and the number of balls selected (sample size = n). Start with the bottom slider all of the way to the left. Slowly slide it to the right to see the probability tree drawn. Remember that after the first set of edges, edges are labeled with conditional probabilities, given that the earlier results getting you to that branch occurred. The probabilities at the end of each path through the tree are found by multiplying the probabilities along that path. The final probability density function for the distribution is given in the spreadsheet to the right. Add up all the joint probabilities at the ends of paths with the same X value to get the probability of that X value in the table. The table also includes the mean and standard deviation of the distribution. The graph of the pdf is below the spreadsheet. We have two different options when selecting the balls. These lead to two standard discrete probability distributions.
Option 1 is to select the balls with replacement. This means that after selecting each ball, we replace it in the urn, mix them up, and then select the next ball. In this option, the total number of balls in the urn is the same at each stage. Therefore, the probability of a blue ball is p = M/N, which is constant each time. The probability of a non-blue ball (red ball) is q = n - p, every time. The events are independent. This distribution is called a Binomial Distribution.
Option 2 is to select the balls without replacement. This means that after selecting each ball, we do NOT replace it. Therefore, the number of balls in the urn (denominator of the probability) is going down one on each stage. The numerator of each probability depends on what has been selected before. The events are NOT independent. This distribution is called a Hypergeometric Distribution.
Distribution of Proportions and Practical Applications
We are often interested in dividing the population into two groups. One is the group of interest called Successes (modeled by blue balls) and the complementary group is called Failures (modeled by red balls). We are interested in the number of successes, or more often the proportion of successes in the sample or in the population. In this setup, the proportion of successes in the population is p = M/N. The proportion of successes in the sample is X/n. The table gives the values of the proportion of successes in the sample. Typically, in real-world applications selection is actually done without repeats (i.e. without replacement). Therefore, a Hypergeometric Distribution is the most realistic. However, as the population size gets larger and larger the Hypergeometric Distribution approaches a Binomial Distribution. For this reason, we often use a Bionomial Distribution to approximate a Hypergeometric Distribution when the population size is unknown but very large.