Chapter 7 - Probability and Samples: The Distribution of Sample Means

The topics discussed in this chapter are some of the most important topics for your understanding of inferential statistics. They also happen to be the most conceptually difficult to understand. Here is where we begin to discuss the theoretical tools that we use to test the significance of our research.

As usual, begin with the Preview and Overview. We are working with samples from the population now, and one of the major issues is how to determine whether our sample is truly representative of the population. The difference between the mean of our sample and the population mean is called sampling error. We want to reduce sampling error as much as possible, and the the law of large numbers tells us that we can do this by having a large sample. One of the main goals of statistics is to determine the probability that our sample is representative of the population and a large sample is more likely to be representative of the population than is a small sample.

Section 7.2 begins the discussion of sample means (the mean of a sample). It should be clear that if you repeatedly take a sample from a population and calculate the means of those samples, the means will likely not all be the same (unless your population has no variability). If you take all the sample means and plot them on a histogram, you will have created what is called a distribution of sample means, sometimes just referred to as a sampling distribution. In other words, if you repeatedly took samples of size n from the population, computed the means of the samples, then made a frequency histogram of the means, you would have a distribution of sample means.

Look closely at Example 7.1. We will do this in class as well.

The distribution of sample means has some interesting characteristics. First, if your samples are big enough (a large n), then the sampling distribution will approximate a normal distribution, which, as you know, is handy for computing probabilities. Second, the mean of your sampling distribution, which is sometimes designated , will be the same as the population mean. Together, these two properties of sampling distributions comprise the central limit theorem. Third, as you also know, to compute probabilities from a normal distribution, you have to know the standard deviation of the distribution. In this case, the standard deviation of the sampling distribution is called the standard error of means, designated , and is calculated by dividing the population standard deviation by the square root of n. In other words, = .

Read and understand everything in Section 7.2 (try the Learning Check) before moving on to Section 7.3.

Section 7.3 reintroduces probability. If you have a distribution of sample means, and you know that it is approximately normally distributed, you can find the probability of obtaining any particular sample mean using the same techniques that we used in the last chapter for an individual score from a population of scores. First you have to convert the sample mean into a z score, then you look that z score up in your normal distribution table. To compute the z score, you calculate how far the sample mean is from the population mean, then divide that difference by the standard error:

Again, the Learning Check can be very helpful.

These previous two sections, 7.2 and 7.3, are very important. Although there are cumbersome passages, read these sections closely and understand them. I will put it in my own words here. When we take a sample from a population, we have to understand that that is just one of many samples that we could have taken, and the mean that we compute is one of many means. Hypothetically speaking, we could take an infinite number of samples of the same size and compute all their means. If we plotted all the means on a frequency distribution, we would see that the distributions of the means would be normally distributed with the mean of these means (the mean of the means!) exactly equal to the population mean. In other words, most of the time our sample means would be close to the population mean, but sometimes they would be far from the population mean (a low probability of extreme values).

If we did this, the shape of the sampling distribution of the means would allow us to make some statements about the probability of getting any particular sample mean. Obviously, sample means that are far from the population mean would be unlikely, whereas sample means that are close to the population mean would be more probable.

It is interesting to note that the distribution of sample means is shaped like a normal distribution (provided your sample n is sufficiently large), no matter what the original distribution looks like. Furthermore, the bigger your sample, the smaller the variability in the sampling distribution. This is our law of large numbers again. The bigger the samples, the closer the means of the samples will be to the population mean.

Now, if we want to know the probability of getting a particular sample mean, given that we know the population mean, all we have to do is find out how many standard deviations (now called "standard errors") our sample mean is away from the population mean. In other words, we have to compute the z score for our sample mean.

I can not stress enough that this is one of the most important chapters of the entire text.


Many of these will be useful tests of your understanding. Once again, I think that you should be able to do every problem in this chapter, but if your time is limited, try the odds.