Economics 145
Prof. Yang

Chapter 3 Methods of Data Analysis

**4.1. Measurement of Data and Data Representation**

The **Data** are the quantitative information on various aspects of economics and
businessfacts or pieces of information that we deal with any data analysis. Consider a set
of annual observations on income and consumption expenditure below:

Consumption

__Year__ __Income__ __Expenditure__

1980 1000 800

1981 1100 875

1982 1200 950

1983 1300 1025

1984 1400 1100

1985 1500 1175

Each data point is called *observation*.

There are three types of data: time series data, cross-section data, and a pooled data.
When observations are for the same entity in different periods of time, they are called *time
series *data. The are usually denoted as X_{t}, where t = 1, 2,.....N. The
above data on income and consumption expenditure are an example of time series data. *The
Economic Report of the President* *1995 *published by the Council of Economic
Advisors carry various time-series economics data over the years from 1958 through 1994.
The frequency by which the time series data vary are annual, quarterly, monthly, weekly,
and even daily. When observations are data for different entities (like person, firms, or
nations) across the section measured at a point in time, they are called *cross-section*
data. They are denoted as X_{i} where i =1, 2, ....n. Finally, when observations
for each varying entity also vary over time, the data are called *a pooled* data. The
pooled

**4.1.1 Measurement of Data**

**
**Various ways to measure data:

-level, first difference, percentage change or growth rate

-Indexes and shares

-Standardized variables

-nominal versus real

**4.1.2 Presentation of Data**

Cross-Tabulations: arrays and frequency distribution

Graphs (Scatter diagram, bar & lines, pie chart, etc.)

**4.1.3 Adjustments to Data**

Exponential Smoothing: Moving Average

Seasonal Adjustments

**4.1.4 Data Sources**

Library Source

Electronic Data Base:Citibank, McGraw Hill, DRI, IMF, OECD, FRB data base,

Data base on Internet

**4.1.5 Reliability of Data**

**
**** **

**4.2 Statistical Analysis of Data**

**4.2.1 Basic Concepts in Statistical Analysis**

**Random variables**: A *random variable *is a variable whose value is a number
determined by the outcome of an experiment. For example, if we ask a group of individuals
their incomes, the variable *income* is a random variable. A *discrete *random
variable is one with a definite distance between each of its possible values.

A continuous random variable is one whose value are measured on a continuous scale.
Examples of quantities that might be represented to be continuous random variables are
temperature, gas milage, or stock prices. A well-known example of contuous random variable
is the normal distribution.

The set of all possible outcomes of a random variable is called the *population*
or *sample space*; and each member of sample space is called a *sample point*.
In the experiment of tossing two coins, the sample space consists of the four possible
outcomes: HH, HT, TH, and TT.

An *event* is a subset of sample space. In our earlier example of tossing two
coins, suppose that we designate the occurrence of one head and one tail as an event X.
Then only two belongs to X, namely HT and TH. Events are said to be *mutually exclusive*
if the occurrence of one event precludes the occurrence of another event. Events are said
to be *exhaustive* if they exhaust all the possible outcomes of an experiment.

**Probability**: Let X be an event in a sample space. By P(X), the probability of
the event X, we mean the proportion of times the event X will occur in repeated trials of
an experiment. If in a total of *n* possible equally likely outcomes of an
experiment, an event occurs *m* times, then we define the ratio *m/n* as the
relative frequency of event X. Example: Let X be the occurrence of one head and one tail
in a toss of two coins, the relative frequency of the event x is 2/4.

**Probability Distributions**:

Probability Density function (PDF): Let x be a random variable taking distinct values x_{1},
x_{2}, ....x_{n}. Then function

*f(x) = P (X=x*_{i}) for i = 1, 2,....n

= 0 for x = x_{i}

*
*For example, consider the toss the a coin. The two possible outcomes of its
possible outcome are head(H) and tail(T). A random variable of interest could be defined
as X = number of heads on a single coin toss. Then X will assign the number 1 to the
outcome H and the number 0 to outcome T. The probability distributin of random variable X
would be

__x__ __ p(X)__

0 ½

1 ½

Here the notation x is the outcome of a random variable and *p(*X)* *[often
expressed as f_{i}] refers to the frequency distribution, probability
distribution, or more commonly "probability density function".

Characteristics of probability distributions

The expected value of a discrete random variable X is defined as the sum of each
value X, x, times the probability associated with the value.

E(X) = m_{X} = S*x
p(x)*

*
*The variance, a measure of random variable’s variation or dispersion, is the
variance and is defined as

Var(X) = s_{X}^{2} = S(*x - m)*^{2} p(x) = S x^{2} p(x) - m^{2}

*
*

Joint PDF, marginal PDF, and conditional PDF:

Statistical Independence:

^{
Some Important Probability Distribution:
Normal Distribution; Chi-square Distribution; Student’s t Distribution
The F Distribution
4.2 Regression Analysis
Mathematical Vs. Statistical Relationship: When there exists a relationship
between one variable, say consumption expenditure, and another variable, income, then the
relationship can be specified as the mathematical model of functional relationship:
Y = b1 + b2X 0<b2<1 (4.1)
where Y = consumption expenditure and X = income, and where b1
and b2 are the intercept and slope coefficients.
Equation (4.1) is also known as exact or deterministic relationship between the two
variables.
In order to introduce the randomness in the dependent variable, consumption
expenditure, one can modify the mathematical model by adding a random (or stochastic)
disturbance or error term as follows:
Y = b1 + b2X + u (4.2)
where u is a disturbance term. Equation (4.2) is known as an econometric or stochastic
model.
Correlation Vs. Regression: Regression and correlation analysis are closely
related but conceptually very different. In correlation analysis, the primary objective is
to measure the strength or degree of linear association between two variables.
Since there is no distinction between the dependent variable and independent variable in
correlation analysis, causality is not an issue. In contrast, regression analysis is
concerned with the study of dependence of one variable, the dependent variable, on
one or more variables, the independent or explanatory variables. The primary objective in
regression analysis is to estimate and/or predict the (population) mean or expected value
of the dependent variable in terms of the known or fixed values of the independent
variable(s). But dependence of one variable on other variable(s) in regression analysis
does not necessarily mean causation. As Kendall and Stuart point out, "A statistical
relationship, however strong and however suggestive, can never establish causal
connection: our ideas of causation must come from outside statistics, ultimately from some
theory or other."
4.2.1 Basic Ideas for Regression Analysis
Regression Analysis: Regression analysis is primarily concerned with the
estimating the population mean or average value of the dependent variable on the basis of
the known and fixed values of the explanatory variable(s).
To understand how this can be done, we have to first understand two concepts like
population regression function and the sample regression function.
Specification of the (true) relationship in the population: PRF.PRF expresses the
relationship between the conditional mean of Y given xi values :(Y X=x)= Sfixi
Numerical example.
Graphical illustration: E(Y Xi) = b1 + b2Xi
+ ui PRF with stochastic term ut.
Sample regression Function (SRF) on one random sample: Y = b1
+ b2Xi
Our primary objective in regression analysis is to estimate the PRF on the basis
of SRF, because our analysis is often based on a single sample from some population.
Graphical illustration of the relationship between PRF and SRF.
Estimator and estimate:
Regression analysis is primarily concerned with the estimating the population mean or
average value of the dependent variable on the basis of the known and fixed values of the
explanatory variable(s). To understand why average or mean value of the dependent
variable, rather than simply the value of the dependent variable, one has to first
understand the two concepts of PRF and SRF; and the primary objective of the regression
analysis is to estimate the PRF on the basis of SRF. That is the basis of inferential
statistics for the testing of hypothesis.
Concepts of PRF and SRF
4.2.2 Regression Analsyis: The Problem of Estimation
The Method of Least Squares: Basic Principles
1. Two-Variable Linear Regression Model
2. Multiple Linear Regression Model
3. Regression on Transformed Variables: Other Functional Forms
4. Regression with Dummy Variables
4.2.3 Regression Analysis: Hypothesis Testing
3.1 Basic Ideas for Hypothesis Testing
1) Interval Estimation: Basic Ideas
2) Confidence Intervals for b1 and b2.
3) Null and alternative hypothesis, Accept or Reject the hypothesis, level of
significance statistical significance, type I or II error.
3.2 Two Methods of Hypothesis Testing:
Confidence Interval Approach
The Test of Significance Approach
4.3 Time Series Analysis
4.3.1 Analysis of Trend: Secular trend, trend as a linear regression; broken trends,
exponential trend.
4.3.2 Tests of Stationarity, Unit Root, and Co-integration & Error-correction model
4.3.3 Vector Auto Regression (VAR)
}