Department of Economics

Economics 145                                                                                                                 Prof. Yang

Chapter 3 Methods of Data Analysis

4.1. Measurement of Data and Data Representation

The Data are the quantitative information on various aspects of economics and businessfacts or pieces of information that we deal with any data analysis. Consider a set of annual observations on income and consumption expenditure below:


Year    Income    Expenditure

1980 1000 800
1981 1100 875
1982 1200 950
1983 1300 1025
1984 1400 1100
1985 1500 1175

Each data point is called observation.

There are three types of data: time series data, cross-section data, and a pooled data. When observations are for the same entity in different periods of time, they are called time series data. The are usually denoted as Xt, where t = 1, 2,.....N. The above data on income and consumption expenditure are an example of time series data. The Economic Report of the President 1995 published by the Council of Economic Advisors carry various time-series economics data over the years from 1958 through 1994. The frequency by which the time series data vary are annual, quarterly, monthly, weekly, and even daily. When observations are data for different entities (like person, firms, or nations) across the section measured at a point in time, they are called cross-section data. They are denoted as Xi where i =1, 2, ....n. Finally, when observations for each varying entity also vary over time, the data are called a pooled data. The pooled

4.1.1 Measurement of Data

Various ways to measure data:

-level, first difference, percentage change or growth rate
-Indexes and shares
-Standardized variables
-nominal versus real

4.1.2 Presentation of Data

Cross-Tabulations: arrays and frequency distribution
Graphs (Scatter diagram, bar & lines, pie chart, etc.)

4.1.3 Adjustments to Data

Exponential Smoothing: Moving Average
Seasonal Adjustments

4.1.4 Data Sources

Library Source
Electronic Data Base:Citibank, McGraw Hill, DRI, IMF, OECD, FRB data base,

Data base on Internet

4.1.5 Reliability of Data


4.2 Statistical Analysis of Data

4.2.1 Basic Concepts in Statistical Analysis

Random variables: A random variable is a variable whose value is a number determined by the outcome of an experiment. For example, if we ask a group of individuals their incomes, the variable income is a random variable. A discrete random variable is one with a definite distance between each of its possible values.

A continuous random variable is one whose value are measured on a continuous scale. Examples of quantities that might be represented to be continuous random variables are temperature, gas milage, or stock prices. A well-known example of contuous random variable is the normal distribution.

The set of all possible outcomes of a random variable is called the population or sample space; and each member of sample space is called a sample point. In the experiment of tossing two coins, the sample space consists of the four possible outcomes: HH, HT, TH, and TT.

An event is a subset of sample space. In our earlier example of tossing two coins, suppose that we designate the occurrence of one head and one tail as an event X. Then only two belongs to X, namely HT and TH. Events are said to be mutually exclusive if the occurrence of one event precludes the occurrence of another event. Events are said to be exhaustive if they exhaust all the possible outcomes of an experiment.

Probability: Let X be an event in a sample space. By P(X), the probability of the event X, we mean the proportion of times the event X will occur in repeated trials of an experiment. If in a total of n possible equally likely outcomes of an experiment, an event occurs m times, then we define the ratio m/n as the relative frequency of event X. Example: Let X be the occurrence of one head and one tail in a toss of two coins, the relative frequency of the event x is 2/4.

Probability Distributions:

Probability Density function (PDF): Let x be a random variable taking distinct values x1, x2, ....xn. Then function

f(x) = P (X=xi) for i = 1, 2,....n
      = 0    for x = xi

For example, consider the toss the a coin. The two possible outcomes of its possible outcome are head(H) and tail(T). A random variable of interest could be defined as X = number of heads on a single coin toss. Then X will assign the number 1 to the outcome H and the number 0 to outcome T. The probability distributin of random variable X would be

x    p(X)

Here the notation x is the outcome of a random variable and p(X) [often expressed as fi] refers to the frequency distribution, probability distribution, or more commonly "probability density function".

Characteristics of probability distributions

The expected value of a discrete random variable X is defined as the sum of each value X, x, times the probability associated with the value.

E(X) = mX = Sx p(x)

The variance, a measure of random variable’s variation or dispersion, is the variance and is defined as

Var(X) = sX2 = S(x - m)2 p(x) = S x2 p(x) - m2

Joint PDF, marginal PDF, and conditional PDF:

Statistical Independence:

Some Important Probability Distribution:

Normal Distribution; Chi-square Distribution; Student’s t Distribution

The F Distribution

4.2 Regression Analysis

Mathematical Vs. Statistical Relationship: When there exists a relationship between one variable, say consumption expenditure, and another variable, income, then the relationship can be specified as the mathematical model of functional relationship:

Y = b1 + b2X 0<b2<1 (4.1)

where Y = consumption expenditure and X = income, and where b1 and b2 are the intercept and slope coefficients. Equation (4.1) is also known as exact or deterministic relationship between the two variables.

In order to introduce the randomness in the dependent variable, consumption expenditure, one can modify the mathematical model by adding a random (or stochastic) disturbance or error term as follows:

Y = b1 + b2X + u (4.2)

where u is a disturbance term. Equation (4.2) is known as an econometric or stochastic model.

Correlation Vs. Regression: Regression and correlation analysis are closely related but conceptually very different. In correlation analysis, the primary objective is to measure the strength or degree of linear association between two variables. Since there is no distinction between the dependent variable and independent variable in correlation analysis, causality is not an issue. In contrast, regression analysis is concerned with the study of dependence of one variable, the dependent variable, on one or more variables, the independent or explanatory variables. The primary objective in regression analysis is to estimate and/or predict the (population) mean or expected value of the dependent variable in terms of the known or fixed values of the independent variable(s). But dependence of one variable on other variable(s) in regression analysis does not necessarily mean causation. As Kendall and Stuart point out, "A statistical relationship, however strong and however suggestive, can never establish causal connection: our ideas of causation must come from outside statistics, ultimately from some theory or other."

4.2.1 Basic Ideas for Regression Analysis

Regression Analysis: Regression analysis is primarily concerned with the estimating the population mean or average value of the dependent variable on the basis of the known and fixed values of the explanatory variable(s).

To understand how this can be done, we have to first understand two concepts like population regression function and the sample regression function.

Specification of the (true) relationship in the population: PRF.PRF expresses the relationship between the conditional mean of Y given xi values :(Y X=x)= Sfixi

Numerical example.

Graphical illustration: E(Y Xi) = b1 + b2Xi + ui PRF with stochastic term ut.

Sample regression Function (SRF) on one random sample: Y = b1 + b2Xi

Our primary objective in regression analysis is to estimate the PRF on the basis of SRF, because our analysis is often based on a single sample from some population.

Graphical illustration of the relationship between PRF and SRF.

Estimator and estimate:

Regression analysis is primarily concerned with the estimating the population mean or average value of the dependent variable on the basis of the known and fixed values of the explanatory variable(s). To understand why average or mean value of the dependent variable, rather than simply the value of the dependent variable, one has to first understand the two concepts of PRF and SRF; and the primary objective of the regression analysis is to estimate the PRF on the basis of SRF. That is the basis of inferential statistics for the testing of hypothesis.

Concepts of PRF and SRF


4.2.2 Regression Analsyis: The Problem of Estimation

The Method of Least Squares: Basic Principles

1. Two-Variable Linear Regression Model
2. Multiple Linear Regression Model
3.     Regression on Transformed Variables: Other Functional Forms
4. Regression with Dummy Variables

4.2.3 Regression Analysis: Hypothesis Testing

3.1 Basic Ideas for Hypothesis Testing

1) Interval Estimation: Basic Ideas
2) Confidence Intervals for b1 and b2.
Null and alternative hypothesis, Accept or Reject the hypothesis, level of significance statistical significance, type I or II error.

3.2 Two Methods of Hypothesis Testing:

Confidence Interval Approach
The Test of Significance Approach

4.3 Time Series Analysis

4.3.1 Analysis of Trend: Secular trend, trend as a linear regression; broken trends,            exponential trend.
4.3.2 Tests of Stationarity, Unit Root, and Co-integration & Error-correction model
4.3.3 Vector Auto Regression (VAR)

Back to Prof. Yang's Home Page