6. Sample Distributions
Home ] Up ]

 

Distribution of Sample Means

  • Sampling with Replacement

    • Example 1: The population from which samples are selected is {1,2,3,4,5,6}.

      This population has a mean of 3.5 and a standard deviation of 1.70783. The next display shows a histogram of the population.

Histogram of Population {1,2,3,4,5,6}

A computer was programmed to take all samples of size 4 (there are 1296) with replacement from this population.  A few of the samples are {1,1,1,1}, {1,1,1,2}, {1,1,1,3}, {1,1,1,4},...,{6,6,6,3}, {6,6,6,4}, {6,6,6,5}, and {6,6,6,6}.

For each of these samples a statistic, the sample mean (i.e. the average of the numbers in the sample), was computed. The sample means for the first few samples shown above are 1, 1.25, 1.5, 1.75,...,5.25, 5.5, 5.75, and 6.  A histogram of all 1296 sample means is shown next.

Histogram of All Sample Means for Samples of Size 4 with Replacement Taken from Population {1,2,3,4,5,6}

The mean of these 1296 sample means is 3.5 and the standard deviation of these 1296 sample means is 0.853913.

From the histogram of sample means it appears that the sample means for samples of size 4 taken with replacement from the population {1,2,3,4,5,6} are normally distributed, at least approximately.

 

 

  • Example 2: The population from which samples are selected is {1,2,3,3,3,10}. 

The observations made in Example 1 may have been true because the population had a uniform symmetric shape.  This example shows a population that is neither uniform nor symmetric.

Histogram of Population {1,2,3,3,3,10}

 

 

This population has a mean of 3.66667 and a standard deviation of 2.92499.  Then a computer found all 1296 samples of size 4 with replacement from this population and calculated the mean of each of these samples.  The mean of these 1296 sample means is 3.66667 and the standard deviation is 1.46249.

A histogram of these sample means is shown next.

Histogram of All 1296 Sample Means for Samples of Size 4 Taken with Replacement from Population {1,2,3,3,3,10}

This histogram resembles a normal curve but it has some gaps and is skewed to the right.  If a larger sample size had been used the curve would look more like a normal curve.  This is suggested by the following histogram showing 400 sample means for samples of size 36 taken with replacement from the same population.  There are 6^36 sample means altogether--it would take too long to compute all of them, and that is why only 400 samples are taken and the means computed for each of them. 

Histogram of 400 Sample Means for Samples of Size 36 Taken with Replacement from Population {1,2,3,3,3,10}

The mean of the 400 sample means is 3.65278 and the standard deviation of them is 0.498121.  The mean of these sample means is very close to the population mean, 3.66667, and the standard deviation is close to 2.92499/Sqrt[36] = 2.92499/6 = 0.487498.

These few examples suggest the following concerning the collection of sample means from all random samples of size n taken from a population, the sampling distribution of sample means:

  • In sampling with replacement the mean of all sample means equals the mean of the population: 
  • When sampling with replacement the standard deviation of all sample means equals the standard deviation of the population divided by the square root of the sample size when sampling with replacement.
  • Whatever the shape of the population distribution, the distribution of sample means is approximately normal with better approximations as the sample size, n, increases.
 

This link takes you to a page which discusses the sampling distribution of sample means.  When you reach the page click the red die in front of exercise 1 to run a simulation showing the distribution of sample means.

  • Sampling without Replacement

    • Example 1: The population from which samples are selected is {1,2,3,4,5,6}.

      A computer selected all samples of size 4 without replacement from this population.  There are 360 such samples.  Then the mean of each sample was taken.  The mean of all of these sample means is 3.5, and the standard deviation is 0.540062.  So the mean of the sample means equals the mean of the population from which the samples are selected.  However, the standard deviation does not follow the rule expressed above.  Dividing the population standard deviation (found in example 1 in the section on sampling with replacement), 1.70783, by the square root of the sample size, 2, results in the number 0.853915, which is not the standard deviation of the sample means, 0.540062.

      In sampling without replacement, the formula for the standard deviation of all sample means for samples of size n must be modified by including a finite population correction.  The formula becomes: where N is the population size, N=6 in this example, and n is the sample size, n=4 in this case.  The finite population correction is the the second square root in this formula.  Using this formula, you get the correct standard deviation for the the population of 360 sample means, namely, 0.540062.

      Most of the time sampling is done without replacement.  However, when n, the sample size, is less than 0.05 times the population size, N, the finite population correction can be dropped.  For example, if N=1000, 0.05N = 0.05 1000=50, so if the sample size is 50 or less, the finite population correction can be dropped.

      The histogram of the 360 sample means is shown next:

      Histogram of all 360 Sample Means for Samples of Size 4 Taken without Replacement from Population {1,2,3,4,5,6}


      The distribution of sample means is still approximately normal.

  • Example 2: The population from which samples are selected is {1,2,3,3,3,10}

    As shown in Example 2 under Sampling with Replacement, this population has a mean of 3.66667 and a standard deviation of 2.92499.  Then a computer found all 360 samples of size 4 with replacement from this population and calculated the mean of each of these samples.  The mean of these 360 sample means is 3.66667 and the standard deviation is 0.924962.  This standard deviation is related to the standard deviation of the population by = (2.92499/Sqrt(4)) (Sqrt((6-4)/(6-1)) = (2.92499/2) (Sqrt(2/5) = 0.924962.

    A histogram of these 360 sample means is shown next.


    Histogram of all 360 Sample Means for Samples of Size 4 Taken without Replacement from Population {1,2,3,4,5,6}


    This distribution is certainly not normally distributed but it can be shown that when larger samples are taken without replacement from a population, the sample mean distribution will more closely approximate a normal distribution.  This is shown in the next two graphs--the first graph shows the histogram of population of size 300 that certainly appears to be non-normal

    Four hundred samples of size 40 were taken from this population (population mean = 1.21986 and population standard deviation = 1.2654), and the mean of each sample was calculated.  A histogram of these 400 sample means is shown next.


    The mean of these sample means is 1.21703 (near 1.21986) and the standard deviation is 0.196691 (near 1.21986/Sqrt(40) = 0.19288).  Note that the finite population correction factor is Sqrt((300-40)/(300-1)) = 0.93.