Statistics that Measure Central Tendency
                      
                        Mean
                          Your have
                    probably used the
                  mean since elementary school.  There it is was called the
                    average.  The mean
                    (or average) of a collection of numbers is
                  computed by adding the numbers and dividing by the number of
                  numbers.  For example the mean of the numbers 2,3,3,4,5,6
                  is 23/6=3.8 rounded to the nearest tenth.  In formula
                    form, the mean of n numbers, x1, x2,
                    ..., xn is given by the sum of the numbers (x's)
                    divided by n, the number of numbers, or  
                        
                  For a data set presented as numbers
                  together with the frequency of occurrence of each number, as
                  in the next table, the computation of the mean is slightly
                  modified. 
                  
                    
                    
                      
                        | Number | 
                        Frequency | 
                       
                      
                        | 2 | 
                        2 | 
                       
                      
                        | 3 | 
                        6 | 
                       
                      
                        | 4 | 
                        7 | 
                       
                      
                        | 5 | 
                        3 | 
                       
                      
                        | 7 | 
                        3 | 
                       
                      
                        | 9 | 
                        2 | 
                       
                     
                    
                   
                          Add another column consisting of each
                  number multiplied by the frequency of occurrence of that
                    number to the table.  Then find the sum of this column as shown: 
                  
                    
                    
                      
                        | 
                           Number  | 
                        
                           Frequency  | 
                        Number*Frequency | 
                       
                      
                        | 2 | 
                        2 | 
                        4 | 
                       
                      
                        | 3 | 
                        6 | 
                        18 | 
                       
                      
                        | 4 | 
                        7 | 
                        28 | 
                       
                      
                        | 5 | 
                        3 | 
                        15 | 
                       
                      
                        | 7 | 
                        3 | 
                        21 | 
                       
                      
                        | 9 | 
                        2 | 
                        18 | 
                       
                      
                        | Sum
                          of (Numbers*Frequencies)=  | 
                        104 | 
                       
                     
                    
                   
                          The mean is the (Sum of
                  Numbers*Frequencies)/(Sum of Frequencies).  In the
                  example the sum of the frequencies is 23, so the mean is
                  104/23=4.5.  In formula form, the mean of numbers x which
                          occur with frequency f is given by 
                       
                          The mean is easy to compute, and as
                          mentioned above, you have
                    probably used it before, but it has one major drawback--an extremely
                          large or small number will cause a larger than desired
                          change in the mean.  For example the mean of
                          2,3,4,5, and 6 is 4.  However, if another number, say 20, is added to the set, the mean of the new
                  set of numbers, 2,3,4,5,6, and 20 is now 40/6=6.7. 
                  Certainly the mean should increase but increasing from 4 to
                  6.7 might be considered to be too much of a change. 
                          In presenting housing prices in the newspaper the mean price of a
                          home is usually not used, simply
                  because the mean is made too high by the relatively few expensive
                  homes in a typical community.  Median home prices are
                          used instead of mean home prices.  The
                  next section discusses the median. 
                          At the bottom of this page is a link
                          to the FOCUS dataset.  Open it and under the STAT
                          menu you will find a choice called Summary
                          Stats.  Use that to find the mean of each of
                          variable in the FOCUS dataset.  Other descriptive statistics introduced below are also
                          computed for each of the variables.  Make a
                          histogram of each variable and see how the descriptive
                          statistics relate to the shape of the histogram. 
                          Also, you can verify the example computations in these
                          notes by opening Webstat--push the orange button at
                          the bottom of this page, select the Clear Data
                          choice under the Data menu, and type the numbers for
                          which you want the mean or other statistics into
                          a column.  Once you have the numbers in a column, you
                          can make any of the Webstat graphs and compute any
                          numerical statistics on your numbers by selecting
                          Histogram under the Graphs menu and Summary Stats
                          under the Stats menu. 
                         
                        Median
                          The
                  median of a collection of numbers is, in a certain sense, the
                  'middle' number of that set.  For example the median of
                  the numbers 2,3,4,5,8 is 4 because 4 is the 'middle' number. 
                          The numbers 2,3,4,5,8,10 don't have a single middle
                          value.  What is the median of them?  It is
                          defined as the average of
                  the two middle numbers, 4 and 5.  The median is then
                  (4+5)/2=4.5. 
                  The process for computing the median
                  of a set of n numbers is: 
                  
                        
                          - 
                        
Sort the numbers and arrange them from
                        smallest to largest.  
                          - 
                        
Consider the smallest number to be in position 1, the next number in the sorted list
                        to be in position 2, the next in position 3, etc.  
                          - 
                        
The median will be the number in
                        position (n+1)/2.  If (n+1)/2 is a whole number,
                        the median will be the number lying in that position. 
                        If (n+1)/2 is a fraction, say 7.5, the median will be
                        the average of the two numbers in positions 7 and 8.  
                         
                   
                  Example: Find the median of the numbers
                  2,3,1,4,4,5,7,2,3, and 8. 
                  
                        
                          - 
                        
In sorted order the numbers are
                        1,2,2,3,3,4,4,5,7,8  
                          - 
                        
The numbers with their positions are  
                        
                          
                          
                          
                            | Position | 
                            1 | 
                            2 | 
                            3 | 
                            4 | 
                            5 | 
                            6 | 
                            7 | 
                            8 | 
                            9 | 
                            10 | 
                           
                          
                            | Number | 
                            1 | 
                            2 | 
                            2 | 
                            3 | 
                            3 | 
                            4 | 
                            4 | 
                            5 | 
                            7 | 
                            8 | 
                           
                         
                          
                         
                         
                          - 
                        
The median is the number in position
                        (10+1)/2=5.5.  Since 5.5 is not a whole number, the
                        median is the average of the numbers in positions 5 and
                        6, or the average of 3 and 4 which equals 3.5.  The
                        median is 3.5.  
                         
                   
                         
                        Mode
                          The
                  mode is the number that occurs most frequently.  For the
                  set of numbers 2,3,4,5,5,6, the mode is 5.  The set of
                  numbers 2,3,4,5,5,6,6 has two modes, 5 and 6.  It is
                  bimodal.  However, when all numbers in a set occur with
                  the same frequency, the set of numbers has no mode.  For
                  example, the numbers 2,2,3,3,4,4,5,5 have no mode. 
                      
                         
                        Quartiles and Percentiles
                          The
                  median divides a set of numbers into halves.  Quartiles
                  divide a set of numbers into quarters and percentiles divide a
                  set of numbers into hundredths.  You may taken achievement tests
                          in school and received your result in the form of a percentile
                          score. 
                  If you were told that you were at the 92nd percentile, then
                  92% of the test scores were equal to or lower than your score and
                  8% of the test scores were equal to or higher than your score. 
                          There are three quartiles for a set of
                  numbers, the 1st quartile, denoted by Q1, the 2nd quartile
                          denoted by Q2, and the 3rd quartile denoted by Q3.  The
                  2nd quartile is also usually called the median, and you have seen how
                  to compute it.  The quartiles divide the dataset
                  into quarters.  To compute the 1st quartile, Q1, simply find
                  the median of all numbers in the dataset that are less than or
                  equal to the median.  To compute the 3rd quartile, Q3,
                  find the median of all numbers in the dataset that are greater
                  than or equal to the median.
                           
                  
                    
                    
                          
                            | Position | 
                            1 | 
                            2 | 
                            3 | 
                            4 | 
                            5 | 
                            6 | 
                            7 | 
                            8 | 
                            9 | 
                            10 | 
                           
                          
                            | Number | 
                            1 | 
                            2 | 
                            2 | 
                            3 | 
                            3 | 
                            4 | 
                            4 | 
                            5 | 
                            7 | 
                            8 | 
                           
                         
                    
                   
                  The median of the numbers in the table just
                  above was found to be the average of the numbers in positions 5 and
                  6, that is (3+4)/2=3.5.  Then the 1st quartile is the
                  median of the numbers that are less than or equal to 3.5, that
                  is the median of 1,2,2,3,3.  These numbers are sorted and
                  the positions are the same as in the last table.  Since
                  there are 5 numbers, the median is the
                  number in position (5+1)/2=3, and this number is 2.  Q1=2.  The
                  3rd quartile is the median of the numbers greater than or
                  equal to 3.5, or the median of 4,4,5,7,8.  Again, since
                  there are 5 numbers here, the median of this set of 5 numbers
                  is the number in position 3, that is 5.  Q3=5. 
                    
                         
                        Resources
                         
                       
                     
                   
  
  A demonstration page for descriptive statistics showing the relationship
  between the histogram of a set of numbers and the corresponding descriptive statistics is
  found by following this link
  to a page designed by Eric Scheide.  The following display shows the
  page. 
   
    
  
    
   
 
                  
                    Statistics that Measure Variability
                      
                        Range
                          The
                  range of a set of numbers equals the largest number minus the
                  smallest number.  The range of the numbers 3,5,9,9,10,13
                  is 13-3=10.  Like the mean, range had the disadvantage of
                          changing by too much when an extremely large or small
                          value is added to a dataset.  The next statistic, the interquartile range
                  does not have this drawback. 
                      
                         
                        Interquartile Range (IQR)
                    The
                  interquartile range is the third quartile minus the first
                  quartile, IQR=Q3-Q1.  For the set of numbers
                  1,2,2,3,3,4,4,5,6,7, in the examples above Q1 was found to be
                  2 and Q3 was found to be 5.  Thus the interquartile range
                  is 5-2=3.  Compare this with the range=7-1=6. 
                      
                         
                        Standard Deviation
                    The measure of variability used most often
                    is called the standard
                  deviation.  The standard deviation is roughly the average
                    of squared deviations from the mean.  The formula for
                    the standard deviation of  x1, x2,
                    ...,xn is    
                     
  
                    
                    where x-bar is the mean of the numbers.  
                    As an example consider the numbers
                    2,3,4,5,6.  The mean is 4.  Then the differences
                    between each of the numbers and the mean are (2-4)=-2,
                    (3-4)=-1, (4-4)=0, (5-4)=1, and (6-4)=2, respectively. 
                    The formula indicates that these numbers must be squared and
                    added.  The squares are 4,1,0,1, and 4, and the sum is
                    10.  Finally the formula directs you to divide this sum
                    by the number of numbers-1, i.e. n-1, and take the square
                    root.  This results in the square root of 10/4 or the
                    square root of 2.5 which is approximately 1.58.   
                    The square of the standard deviation is
                    called the variance of the set of numbers.  The
                    variance has the drawback that the units of standard
                    deviation are the square of the units of the numbers used to
                    compute variance.  For example, if the units of the
                    numbers shown in the last example are inches, the units of
                    the variance are square inches. 
                    An easier formula for computing the standard
                    deviation is  
                     
                         
                    
                    and the easy formula for computing standard deviation for
                    numbers, x, given along with frequencies, f, is  
                           
  
                       
           
  
                     
                   
                  
                    Other Statistics and Displays
                      
                        Boxplots (Also called Box and Dot or Box and
              Whisker Plots)
                          A boxplot displays the center (as
                          given by the median) of a dataset, the range, and the
                          quartiles.  The next picture shows two boxplots,
                          one of the SAT Verbal and the other the SAT Math
                          scores from the FOCUS dataset. 
                            
                          The white line in the box lies above
                          the median value for that variable.  You can see
                          that the median SAT Verbal score is around 460 and the
                          median SAT Math score is about 540.  The left
                          side of the box lies above the 1st quartile and the
                          right side of the box is positioned above the 3rd
                          quartile of the variable.  So for SAT Math the
                          first quartile is about 460 while the third quartile
                          is approximately 590.  Since 25% of the data
                          values are less than the first quartile and 25% of the
                          data values are greater than the third quartile, the
                          boxes indicate the range of values in which the middle
                          50% of the numbers lie.  From the above graph you
                          can see that the middle 50% of the SAT Math values are
                          more spread out than the middle 50% of SAT Verbal
                          scores.  The horizontal line from the right of
                          each box stops where the short vertical line
                          positioned above the largest number for that variable,
                          and the horizontal line from the left of each box
                          stops at the short vertical line over the smallest
                          value for the variable.  The distance from the
                          smallest value to the largest value, the range is
                          shown in the graph. 
                          The boxplot displays variability,
                          center, and shape of a dataset.  In the above
                          graph of SAT Math and Verbal scores, you can see that
                          both variable have approximately the same amount of
                          variability, the center of the SAT Math scores is
                          greater than the center of the verbal scores, and both
                          of the variables have an approximately symmetric
                          shape.  The next boxplot of the billionaires92
                          wealth variable shows a dataset that is strongly
                          skewed to the right.  Even the position of the
                          median within the box shows a right skew for the
                          middle 50% of the wealth data. 
                            
                            
                          What is the relationship between the
                          histogram and the boxplot of a set of numbers? 
                          To experiment with histograms and the corresponding
                          boxplots open  this link. 
                          When the link opens select Relative Frequency in the
                          left dropdown menu and Boxplot from the right dropdown
                          menu.  Then, by pointing at the axis with your
                          mouse cursor and clicking, you can add numbers. 
                          The vertical red bars show the histogram of the
                          numbers that you have added and the horizontal red
                          display below the histogram shows the boxplot that
                          goes with the histogram.  Try various shaped
                          histograms and see how the boxplot corresponds with
                          the histogram. 
                           
                         
                       
                      
                        Standard Scores
                          Suppose you and a friend are both taking
              Statistics 1 but are in different sections.  You both take a
              midterm examination and wish to compare your performances on the
              exam.  You received a score of 80 in a section that had a
              mean of 76 and a standard deviation of 5, while your friend
              received a score of 76 in a section that had a mean of 66 and a
              standard deviation of 8.  Who performed better?  In
              order to determine this, the scores need to be placed on the same
              footing, that is be modified as if they both came from a test with
              the same mean and standard deviation.  This can be done by
              subtracting the mean of the section and dividing by the standard
              deviation of the section.  That is (x-mean)/(standard
              deviation) is computed for each score.  For your score
              of 80 this results in (80-76)/5=0.8 while for your friend's score you
              get (76-66)/8=1.25.  This means that your friend had a better
              performance.  
              The standard score corresponding to a number x, denoted by z,
              is given by the next formula:  
                      
                    
              where x is the actual score, x-bar is the mean of the set of numbers,
              and s is the standard
              deviation of the numbers.  The standard score indicates how
              many standard deviations above (if z is positive) or below the
              mean (if z is negative) the number, x, falls.  
                     
                
                         
                        Sample
                        and Population Statistics 
                          All
                        of the statistics used above apply to samples--they are
                        called sample statistics.  The related statistics
                        for populations are slightly different.  The
                        following notations and differences in formulas apply:  
                        
                        
                          Descriptive measures for a
                            population are called parameters of the population
                            while related measures for a sample are called
                            statistics of the sample.  
                          - 
                              
The size of a sample is usually
                              denoted by n while the size of the population is
                              given by N  
                          - 
                              
The sample mean is written as
                              x-bar while the population mean is usually denoted
                              by µ.  
                          - 
                              
The sample standard deviation is
                              called s and the population standard deviation is
                              called sigma.  
                          - 
                              
The formula for sample standard
                              deviation is  
                                
                               
                              but the formula for population
                              standard deviation is 
                                
                            There are two differences. 
                              First, the sample mean is replaced by the
                              population mean.  This isn't surprising. 
                              The second difference, the divisor for the population standard deviation
                            is N, while the divisor for the sample standard deviation is
                            n-1 is harder to explain.  There is a
                              good statistical reason for the difference but
                              that reason will be left to another statistics
                              course.  You should simply use the formula
                            that is
                              appropriate for the situation.  If you are
                              told that you have a population, use the second
                              formula, and for a sample use the first formula. 
                              An easier-to-use formula for population standard
                              deviation is  
                                
                              If the numbers are given along
                              with frequencies the formula to use is  
                                
                              where N is the sum of the
                              frequencies.  
                         
                         
                         
                        Resources 
              See Section 3.5 in the Weiss textbook.  
                 
                          To work with the entire Focus Database
                          from within WebStat use the next link.
                            
  
  
 
                           
                       
                     
                   
                 |  
  |   
      
        |     |