CALIFORNIA STATE UNIVERSITY, SACRAMENTO
Department of Economics
Chapter 3 Methods of Data Analysis
4.1. Measurement of Data and Data Representation
The Data are the quantitative information on various aspects of economics and
businessfacts or pieces of information that we deal with any data analysis. Consider a set
of annual observations on income and consumption expenditure below:
Year Income Expenditure
1980 1000 800
1981 1100 875
1982 1200 950
1983 1300 1025
1984 1400 1100
1985 1500 1175
Each data point is called observation.
There are three types of data: time series data, cross-section data, and a pooled data.
When observations are for the same entity in different periods of time, they are called time
series data. The are usually denoted as Xt, where t = 1, 2,.....N. The
above data on income and consumption expenditure are an example of time series data. The
Economic Report of the President 1995 published by the Council of Economic
Advisors carry various time-series economics data over the years from 1958 through 1994.
The frequency by which the time series data vary are annual, quarterly, monthly, weekly,
and even daily. When observations are data for different entities (like person, firms, or
nations) across the section measured at a point in time, they are called cross-section
data. They are denoted as Xi where i =1, 2, ....n. Finally, when observations
for each varying entity also vary over time, the data are called a pooled data. The
4.1.1 Measurement of Data
Various ways to measure data:
-level, first difference, percentage change or growth rate
-Indexes and shares
-nominal versus real
4.1.2 Presentation of Data
Cross-Tabulations: arrays and frequency distribution
Graphs (Scatter diagram, bar & lines, pie chart, etc.)
4.1.3 Adjustments to Data
Exponential Smoothing: Moving Average
4.1.4 Data Sources
Electronic Data Base:Citibank, McGraw Hill, DRI, IMF, OECD, FRB data base,
Data base on Internet
4.1.5 Reliability of Data
4.2 Statistical Analysis of Data
4.2.1 Basic Concepts in Statistical Analysis
Random variables: A random variable is a variable whose value is a number
determined by the outcome of an experiment. For example, if we ask a group of individuals
their incomes, the variable income is a random variable. A discrete random
variable is one with a definite distance between each of its possible values.
A continuous random variable is one whose value are measured on a continuous scale.
Examples of quantities that might be represented to be continuous random variables are
temperature, gas milage, or stock prices. A well-known example of contuous random variable
is the normal distribution.
The set of all possible outcomes of a random variable is called the population
or sample space; and each member of sample space is called a sample point.
In the experiment of tossing two coins, the sample space consists of the four possible
outcomes: HH, HT, TH, and TT.
An event is a subset of sample space. In our earlier example of tossing two
coins, suppose that we designate the occurrence of one head and one tail as an event X.
Then only two belongs to X, namely HT and TH. Events are said to be mutually exclusive
if the occurrence of one event precludes the occurrence of another event. Events are said
to be exhaustive if they exhaust all the possible outcomes of an experiment.
Probability: Let X be an event in a sample space. By P(X), the probability of
the event X, we mean the proportion of times the event X will occur in repeated trials of
an experiment. If in a total of n possible equally likely outcomes of an
experiment, an event occurs m times, then we define the ratio m/n as the
relative frequency of event X. Example: Let X be the occurrence of one head and one tail
in a toss of two coins, the relative frequency of the event x is 2/4.
Probability Density function (PDF): Let x be a random variable taking distinct values x1,
x2, ....xn. Then function
f(x) = P (X=xi) for i = 1, 2,....n
= 0 for x = xi
For example, consider the toss the a coin. The two possible outcomes of its
possible outcome are head(H) and tail(T). A random variable of interest could be defined
as X = number of heads on a single coin toss. Then X will assign the number 1 to the
outcome H and the number 0 to outcome T. The probability distributin of random variable X
Here the notation x is the outcome of a random variable and p(X) [often
expressed as fi] refers to the frequency distribution, probability
distribution, or more commonly "probability density function".
Characteristics of probability distributions
The expected value of a discrete random variable X is defined as the sum of each
value X, x, times the probability associated with the value.
E(X) = mX = Sx
The variance, a measure of random variables variation or dispersion, is the
variance and is defined as
Var(X) = sX2 = S(x - m)2 p(x) = S x2 p(x) - m2
Joint PDF, marginal PDF, and conditional PDF:
Some Important Probability Distribution:
Normal Distribution; Chi-square Distribution; Students t Distribution
The F Distribution
4.2 Regression Analysis
Mathematical Vs. Statistical Relationship: When there exists a relationship
between one variable, say consumption expenditure, and another variable, income, then the
relationship can be specified as the mathematical model of functional relationship:
Y = b1 + b2X 0<b2<1 (4.1)
where Y = consumption expenditure and X = income, and where b1
and b2 are the intercept and slope coefficients.
Equation (4.1) is also known as exact or deterministic relationship between the two
In order to introduce the randomness in the dependent variable, consumption
expenditure, one can modify the mathematical model by adding a random (or stochastic)
disturbance or error term as follows:
Y = b1 + b2X + u (4.2)
where u is a disturbance term. Equation (4.2) is known as an econometric or stochastic
Correlation Vs. Regression: Regression and correlation analysis are closely
related but conceptually very different. In correlation analysis, the primary objective is
to measure the strength or degree of linear association between two variables.
Since there is no distinction between the dependent variable and independent variable in
correlation analysis, causality is not an issue. In contrast, regression analysis is
concerned with the study of dependence of one variable, the dependent variable, on
one or more variables, the independent or explanatory variables. The primary objective in
regression analysis is to estimate and/or predict the (population) mean or expected value
of the dependent variable in terms of the known or fixed values of the independent
variable(s). But dependence of one variable on other variable(s) in regression analysis
does not necessarily mean causation. As Kendall and Stuart point out, "A statistical
relationship, however strong and however suggestive, can never establish causal
connection: our ideas of causation must come from outside statistics, ultimately from some
theory or other."
4.2.1 Basic Ideas for Regression Analysis
Regression Analysis: Regression analysis is primarily concerned with the
estimating the population mean or average value of the dependent variable on the basis of
the known and fixed values of the explanatory variable(s).
To understand how this can be done, we have to first understand two concepts like
population regression function and the sample regression function.
Specification of the (true) relationship in the population: PRF.PRF expresses the
relationship between the conditional mean of Y given xi values :(Y X=x)= Sfixi
Graphical illustration: E(Y Xi) = b1 + b2Xi
+ ui PRF with stochastic term ut.
Sample regression Function (SRF) on one random sample: Y = b1
Our primary objective in regression analysis is to estimate the PRF on the basis
of SRF, because our analysis is often based on a single sample from some population.
Graphical illustration of the relationship between PRF and SRF.
Estimator and estimate:
Regression analysis is primarily concerned with the estimating the population mean or
average value of the dependent variable on the basis of the known and fixed values of the
explanatory variable(s). To understand why average or mean value of the dependent
variable, rather than simply the value of the dependent variable, one has to first
understand the two concepts of PRF and SRF; and the primary objective of the regression
analysis is to estimate the PRF on the basis of SRF. That is the basis of inferential
statistics for the testing of hypothesis.
Concepts of PRF and SRF
4.2.2 Regression Analsyis: The Problem of Estimation
The Method of Least Squares: Basic Principles
1. Two-Variable Linear Regression Model
2. Multiple Linear Regression Model
3. Regression on Transformed Variables: Other Functional Forms
4. Regression with Dummy Variables
4.2.3 Regression Analysis: Hypothesis Testing
3.1 Basic Ideas for Hypothesis Testing
1) Interval Estimation: Basic Ideas
2) Confidence Intervals for b1 and b2.
3) Null and alternative hypothesis, Accept or Reject the hypothesis, level of
significance statistical significance, type I or II error.
3.2 Two Methods of Hypothesis Testing:
Confidence Interval Approach
The Test of Significance Approach
4.3 Time Series Analysis
4.3.1 Analysis of Trend: Secular trend, trend as a linear regression; broken trends,
4.3.2 Tests of Stationarity, Unit Root, and Co-integration & Error-correction model
4.3.3 Vector Auto Regression (VAR)
Back to Prof. Yang's Home Page