CHAPTER 3: METHODOLOGY AND MODELS

I. Introduction

Do specific cultural, economic, educational, social, and public input factors affect teenage birthrates in California counties? The literature reviewed in the previous chapter suggests that a number of such factors do indeed affect teenage birthrates. The causal factors found to be significant range from an individual's attitude about school (Plotnick, 1992) to a state's level of AFDC payments (Zimmerman and Gage, 1997). The theories underlying researchers' selection of possible causal factors included economic models (King, Myers, and Byrne, 1992; and Leibowitz, Eisen, and Chow, 1986) and social/behavior models (Plotnick, 1992; and Yamaguchi and Kandel, 1987). Some of the previous studies also considered the effects of public policies and public inputs on teenage birth rates (Lundberg and Plotnick, 1995; Kane and Staiger, 1996; and Zimmerman and Gage, 1997).

The literature reviewed also indicates that a study of teenage birthrates is enhanced by an understanding of high school dropout rates. Some of the factors found to affect teenage birthrates were also found to affect high school dropout rates. In addition, teenage birthrates can directly influence high school dropout rates and vice versa. The relationship between teenage birthrates and high school dropout rates is complex. There is evidence that the relationship is reciprocal or endogenous, thus requiring careful attention when developing a theoretical model (Rindfuss and St. John, 1983).

In this chapter I discuss my methodology for testing the hypothesis that specific cultural, economic, educational, social, and public input factors affect teenage birthrates in a California county. The remainder of this chapter is divided into three sections. In Section II, I briefly explain my method of analysis, which is multivariate regression analysis. In Section III, my theoretical regression models for testing the hypothesis are presented in terms of broad causal factors. I briefly explain my choice of dependent variables and the broad causal factors for each model. Section IV focuses on the specific variables used as proxies for the broad factor categories and predictions about the direction of their effect. This section includes three tables. Table 3.1 is a list of the specific variables with a short definition and the source of each variable used in the analysis. The variables' descriptive statistics are reported in Table 3.2. Table 3.3 is a correlation matrix that provides the simple correlation coefficients of the variables, estimates of the strength and nature of the relationship between pairs of variables.

II. Multivariate Regression Analysis

Regression analysis is a statistical method used to estimate the effect of an explanatory variable on a dependent variable and to test whether a significant relationship exists between the dependent and explanatory variables (Studenmund, 1997). For example, a regression model for the number of cookies sold at a bake sale, the dependent variable, might have as its explanatory variable the number of chocolate chips in each cookie. In other words, the number of cookies sold depends on the number of chips in the cookies. Assuming that the average person prefers more chips rather than fewer chips, the appropriate hypothesis would be that the more chips in the cookies, the more cookies will be sold. I can test this hypothesis using regression analysis. If I gathered information about several different bake sales, including the number of cookies sold at each and the average number of chips in the cookies, I could use regression analysis to estimate how many cookies would be sold based on the average number of chips in the cookies at the sale.

I could also tell the cookie sellers how confident I was that the average number of chips has an effect on cookie sales.

Multivariate regression models are equations with more than one explanatory variable, or causal factor. By using multivariate regression models, the impact of each explanatory variable on the dependent variable can be distinguished from the impact of all the others. Multivariate regression analysis is a technique used to estimate the change in the dependent variable caused by the change in one of the explanatory variables, holding all the other explanatory variables constant. Returning to the cookie sale example, a multivariate regression model might have the number of cookies sold as the dependent variable and the average number of chips in the cookies, the price per cookie, and the price of other baked goods as its explanatory variables. Multivariate regression analysis would allow me to estimate the effect on cookie sales of putting more chips in the cookies if the price per cookies and price of other baked goods remained the same at every bake sale.

In this study of teenage birthrates I used SPSS, a statistical analysis computer program, to estimate the effects of the explanatory variables on the dependent variable in multivariate regression models and determine whether the relationship between the variables is statistically significant. The magnitude, or strength, of the effect is also estimated by SPSS. The results of the regression are reported in the next chapter.

The multivariate theoretical models for the causes of teenage birthrates and of high school dropout rates each have many explanatory variables. One of the explanatory variables for teenage birthrates is high school dropout rate, and one of the explanatory variables for high school dropout rates is teenage birthrate. This endogenous relationship between teenage birthrates and high school dropout rates requires special consideration when using regression analysis to estimate the impact of the explanatory variables. A special multivariate regression technique, Two-Stage Least Squares, is utilized to control for the endogenous nature of the relationship. Two-Stage Least Squares, or 2SLS, is a method for creating a new variable, called an instrumental variable, to replace the endogenous variable in the regression model (Studenmund, 1997, p. 541). The results of the 2SLS regression are reported in the next chapter.

III. The Regression Models

An essential first step in developing a theoretical model for hypothesis testing is to decide on the dependent variable, or an appropriate measure of the topic under study. The dependent variable for my first model is the birthrate per 1,000 women aged 15-19 for each of the 57 counties in California in 1990. Birthrates for teens 15-19 are appropriate for the purposes of this study because the vast majority of teenage births occur within this age group (Powell, 1994, p. 8). There were no teenage birthrates available for four counties: Mariposa, Modoc, Mono, and Sierra. According to the source I used for other counties' birthrates, these counties lacked either sufficient numbers of teenage women aged 15-19 or of births to women in that age group to calculate statistically reliable teenage birthrates (Powell, 1994, p. 8). Rather than omit these small counties from my analysis, I chose to calculate the teenage birthrate for these counties by dividing the total number of births to teens in the county by the number of 13-19 year old women

in the county then multiplying the result by 1000. I utilized this calculated birthrate as the birthrate for 15-19 year olds in these four counties.

I chose aggregate county level data because I was specifically interested in teenage birthrates in California and sufficient individual level data was not available for my study. If individual level data had been available, I would have preferred to use individual data. One reason for this preference is that the models for this thesis are based on the previous studies, most of which are individual level studies. It was sometimes difficult to find county level explanatory variables that were equivalent to individual level explanatory variables. However, the teenage birthrates in California counties and the character of the counties themselves are varied enough to warrant analysis and to justify a county level study. In addition, there is ample county level data available from which to select possible causal factors to include as explanatory variables in the regression model.

Teenage birthrate regression model

After the dependent variable is chosen, the next step in a regression study is to determine broad categories of factors expected to influence the dependent variable. The broad factors expected to affect teenage birthrates are culture, economic status, educational status, home environment, community environment, and public inputs. Specifically, the model is shown as the following function:

Teenage Birthrate = ƒ(Culture, Economic Status, Educational Status, Home Environment, Community Environment, Public Inputs),

where,

Culture = ƒ(Hispanic Population, Black Population, Asian Population, Spanish Speaking),

Economic Status = ƒ(Families in Poverty, Teens in Poverty, Median Family Income),

Educational Status = ƒ(High School Dropout Rate, Female SAT Scores, Adults with Bachelor's Degrees, Public School Attendance),

Home Environment = ƒ(Female-Headed Households, Working Mothers),

Community Environment = ƒ(Rural Population, Suburban County, Urban County),

Public Inputs = ƒ(Community Clinics, Physicians)

The broad categorical factors in this regression model were chosen based on the studies and literature reviewed in the previous chapter. For example, I included cultural factors based on studies that reported differences in the birthrates of white non-Hispanic teens and Hispanic teens (Leibowitz, Eisen, and Chow, 1986). Because California has larger populations of ethnic minorities than most other states, it would be important for policy purposes to know if there is any variation in birthrates due to cultural factors.

The economic characteristics of counties are included in the model based on the theory that teens consider opportunity costs when they decide whether to continue a pregnancy. In a county with less poverty and higher median family incomes, teens face higher opportunity costs in terms of delayed entry into the workforce, for example, than teens in counties with more poverty. Factors correlated to poverty may also have an effect on birthrates. Mayer (1997, pp. 92-93) notes that unobserved and unmeasured parental characteristics correlated with poverty are important influences on a teen's behavior, including teenage childbearing.

Opportunity costs are also a reason for considering educational factors in the regression model for teenage birthrates. In counties with more college graduates and fewer high school dropouts, the opportunity costs of delaying education might be higher than in counties with fewer college graduates. The studies based on social/behavior theories that found that attitudes towards school experiences and educational expectations are significant causal factors for teenage childbearing (Plotnick, 1992; and Yamaguchi and Kandel, 1987) are additional reasons to include educational factors.

Home environment factors are included in the regression model because they too are expected to influence a teen's behavior, based on the findings of several studies in the literature review (for example, see Evans, Oates, and Schwab, 1992). One way that home environment factors could influence behavior is through unobservable characteristics, factors that may have indirect effects on teenage birthrates. The home environment is also a source of role models, with the female parent expected to have a considerable impact on her daughter.

The community environment is also expected to affect teenage birthrates. California's counties range from 100 percent rural to zero percent rural, and their populations range from less than two thousand to more than eight million. These differences are reflected in the distinct qualities of counties and may account for some of the variation in teenage birthrates between counties.

Finally, I included public input factors to measure the effect of public resources on teenage birthrates. Other studies have also considered the availability of reproductive health services as causal factors (Kane and Staiger, 1996; and Zimmerman and Gage, 1997). For policy purposes it is important to know if public input factors are significantly related to teenage birthrates. These factors are more easily manipulated to effect changes in teenage birthrates than other kinds of factors. An understanding of the effects of these factors can aid policymakers who are attempting to lower teenage birthrates.

High school dropout rate regression model

In order to utilize the Two-Stage Least Squares regression technique to control for the endogenous relationship between teenage birthrates and high school dropout rates, I also had to develop a regression model for dropout rates. The dependent variable in this model is the one-year high school dropout rate for the 1990-91 school year as calculated by the California Department of Education (CDE). This rate is calculated for students who would have been in 9th through 12th grades if they had not dropped out. By definition, a dropout was previously enrolled in grades 7 through 12 and has left school for 45 consecutive days without re-enrolling in that school or another school. In addition, to be classified as a dropout an individual must be under 21 and not have received a high school diploma or its equivalent, such as passing the GED. Recently, the reliability of the CDE's dropout data has been questioned. However, no other measure of dropout rates is without its flaw (Schrag, 1999). Given that no uncontroversial measure of the dropout rate is available, I use the CDE data for this study.

The broad factors expected to affect high school dropout rates are culture, economic status, educational status, home environment, community environment, and public inputs. Specifically, the model is shown as the following function:

High School Dropout Rate = ƒ (Culture, Economic Status, Educational Status, Home Environment, Community Environment, Public Inputs, Teenage Birthrates),

where,

Culture = ƒ(Hispanic Population, Black Population, Asian Population, Spanish Speaking),

Economic Status = ƒ(Families in Poverty, Teens in Poverty, Median Family Income),

Educational Status = ƒ(Female SAT Scores, Adults with Bachelor's Degrees, Public School Attendance),

Home Environment = ƒ(Female-Headed Households, Working Mothers),

Community Environment = ƒ(Rural Population, Suburban County, Urban

County, Agricultural Employment, Manufacturing Employment),

Public Inputs = ƒ(Per Pupil Expenditure, Pupil to Teacher Ratio, Average Class Size),

Teenage Birthrates = ƒ(Teenage Birthrate)

In this regression model, as in the model for teenage birthrates, the causal factors were selected based on the studies and literature reviewed in Chapter 2. The reasoning behind their selection is also similar. Once again, cultural factors are considered because other studies have found differences in dropout rates between ethnic groups (Kaufman, McMullen, and Sweet, 1996) and because of the large minority population in California that could make cultural differences even more significant.

The opportunity costs of dropping out of high school may be different in counties with more poverty and less educated workers than in counties with less poverty and more educated workers, thus economic and educational factors are included. Since the dependent variable in this model is concerned with educational outcomes, I would have been remiss if I had not included educational factors.

Home and community environment factors are included in this model for the same reasons they are included in the teenage birthrate model. The home environment may affect dropout rates through unobservable characteristics that have indirect effects and as a source of role models. The differences in the total population and rural population of counties, which contribute to their distinct qualities, may account for some of the variation in teenage birthrates between counties.

One study of public resources for education found that how the dollars were spent had a strong, indirect impact on high school dropout rates (Fitzpatrick and Yoels, 1992). I included public input factors in the model to control for differences in the allocation of resources for education. These factors are policy factors that can be altered if they are found to be significant in determining dropout rates. Policymakers could benefit from knowing the true nature of their effect.

The broad causal factors in both of these models are general categories. It is difficult to predict the direction of the effect that these broad factors will have on the dependent variables. The causal factors become more meaningful when they are defined in terms of specific explanatory variables. Once I describe the explanatory variables I chose, I predict their effects and describe how I gathered the data and tested the appropriate hypotheses.

IV. The Specific Explanatory Variables

In choosing the explanatory variables, I once again relied on the studies in the literature review. There are several difficulties in choosing the best variables to explain the variation in teenage birthrates and high school dropout rates between California's counties. First, there are many possible variables to consider using and each variable used must be justified. Second, the conflicting results of different studies necessitate careful thinking about which results are the most sound. Third, many of the studies in the literature review focus on individual level data and not aggregate county level data. Often there is not comparable county level data for individual level variables that were found to be significant. Finally, the choice of specific variables is limited to data that had already been gathered or that can be calculated from available data.

Once the specific explanatory variables are selected, the next step is to predict the nature of their relationship to the dependent variable. If the relationship is positive, as the explanatory variable increases the dependent variable also increases. In a positive relationship, the two variables are said to move in the same direction. On the other hand, if the relationship is negative, as the explanatory variable increases the dependent variable decreases. The variables move in the opposite direction in a negative relationship.

In this section I report my predictions about the nature of the relationships between the dependent variables and the explanatory variables for both regression models, beginning with the teenage birthrate model. The explanatory variables within the categories of broad causal factors will not all be related to the dependent variable in the same way: some will be negatively related, some positively related, and some will have uncertain relationships.

Table 3.1 is a list all of the dependent and explanatory variables in both models. A short definition and the source of each variable are provided for each variable. The predictions about the variables will be reported in the order the variables are listed in the table.

Table 3.1: List of Variables with Sources

Variable Description Source
Teenage Birthrate Birthrate per 1000 teenage women aged 15-19, 1990 Teen Pregnancy in California: Effective Prevention Strategies, Sacramento: California State Library Foundation, December 1994, p. 30
Hispanic Population Hispanic population, by percent, 1990 California Statistical Abstract 1997, Department of Finance, p. 19
Black Population Black population, by percent, 1990 California Statistical Abstract 1997, Department of Finance, p. 19
Asian Population Asian population, by percent, 1990 California Statistical Abstract 1997, Department of Finance, p. 19
Spanish Speaking Percent of population who speak Spanish at home, 1990 County and City Data Book 1994, U.S. Department of Commerce, Bureau of the Census, pp. 47, 61
Families in Poverty Percent of families with incomes below poverty level, 1990 County and City Data Book 1994, U.S. Department of Commerce, Bureau of the Census, pp. 51, 67
Teens in Poverty Percent of teens living in poverty, 1990 Teen Pregnancy in California: Effective Prevention Strategies, Sacramento: California State Library Foundation, December 1994, p. 29
Median Family Income Median family income earned in 1989 County and City Data Book 1994, U.S. Department of Commerce, Bureau of the Census, pp. 50, 66
High School Dropout Rate Rate per 100 students, 9-12 grade, who have dropped out of high school, 1990 Office of Financial Accountability and Information Services, School Fiscal Services Division, California Department of Education
Female SAT Scores Percent of female students with SAT scores above the state average, 1989-90 school year College Bound Report, 1985-86 to 1991-92, Research, Evaluation and Technology Division, California Department of Education
Adults with Bachelor's Degrees Percent of adults over 25 with bachelor's degrees or higher County and City Data Book 1994, U.S. Department of Commerce, Bureau of the Census, pp. 50, 66
Public School Attendance Percent of children attending public (rather than private) schools, 1990 County and City Data Book 1994, U.S. Department of Commerce, Bureau of the Census, pp. 50, 66
Female-Headed Households Percent of female-headed households (no spouse present), 1990 1997 County and City Extra, Annual Metro, City and County Data Book, pp. 53, 67
Working Mothers Percent of women in the labor force whose children are under 18, 1990 The Source Book of County Demographics, Census Edition, Volume Two, 1992, p. 5-B
Rural Population Percent of population living in rural areas, 1990 The Source Book of County Demographics, Census Edition, Volume Two, 1992, p. 5-A
Suburban County The 22 counties falling between the 10 most populous and the 26 least populous counties, 1990 1990 Membership Roster, California State Association of Counties
Urban County The 10 counties with the highest total population, 1990 1990 Membership Roster, California State Association of Counties
Agricultural Employment Percent of workers employed in the agricultural industry County and City Data Book 1994, U.S. Department of Commerce, Bureau of the Census, pp. 54, 68
Manufacturing Employment Percent of workers employed in the manufacturing industry County and City Data Book 1994, U.S. Department of Commerce, Bureau of the Census, pp. 54, 68
Community Clinics The number of community clinics divided by the number of women, aged 13-44 Community Clinic Fact Book,1990, page 10, and Women at Risk of Unintended Pregnancy, 1990 Estimates, pp. 22, 24
Physicians The number of active physicians per 100,00 residents County and City Data Book 1994, U.S. Department of Commerce, Bureau of the Census, pp. 49, 63
Per Pupil Expenditure Current expenditures of Education per Average Daily Attendance, 1990-91 Office of Financial Accountability and Information Services, School Fiscal Services Division, California Department of Education
Pupil to Teacher Ratio Ratio of K-12 students to teachers, California public schools, 1990-91 CBEDS Data Collection, October 1990, Educational Demographics, California Department of Education
Average Class Size Average number of pupils in K-12 classes CBEDS Data Collection, October 1990, Educational Demographics, California Department of Education

Predictions for teenage birthrate variables

Just as selecting variables is hampered by the conflicting results of previous studies, so is predicting the nature of the relationship between the dependent variable and each explanatory variable. In addition to relying on previous studies, I also relied on the theories behind the studies and on common sense to predict the direction of the effect a variable would have on teenage birthrates. In some cases, I am uncertain of the nature of the relationship and only predict that the relationship if significant.

The percent of Hispanic, black, and Asian populations and Spanish speaking households are all proxies for culture. Numerous studies reported variation in birthrates by race and ethnicity, indicating culture plays a role in determining birthrates. California-specific studies suggest that Hispanic women who were born in Mexico differ from their American-born counterparts, so I included the percent of people who speak Spanish in their home to capture that cultural difference (Powell, 1994, pp. 10-14). Both the percent of Hispanic population and percent of Spanish-speakers are expected to be positively associated with birthrates because Hispanic teens are assumed to have cultural and religious values that discourage contraceptive use and support birth over abortion. Because California-specific studies have shown that African-American teens in California are less likely to give birth than other ethnicities, the variable Black Population is expected to negatively affect birthrates. The direction of the effect of the variable Asian Population is uncertain; anti-immigrant groups have claimed that Asians give birth at higher rates than whites, but their claims are suspect due to their political motivation.

The variables for the broad factor of economic status are the percent of families in poverty, the percent of teens in poverty, and median family income. All are appropriate because they are widely used measures of economic well being. Poverty is frequently reported to be positively related to teen birthrates. These variables seek to verify that conclusion. Birthrates are predicted to increase as teen poverty and family poverty increase, while birthrates are expected to decrease as median family income increases. I expect these outcomes because the opportunity cost of teen birth has been found to be a significant factor. Poorer teens have less to lose than teens in better economic circumstances, so poorer teens more likely to give birth than abort.

For educational status I used high school dropout rates, the percent of female students with higher than average SAT scores, the percent of adults with bachelor's degrees, and public school attendance as explanatory variables. Two of these variables, the dropout rate and SAT scores, have the advantage of representing the same age group as the dependent variable, teenage birthrates for 15-19 year old women. Failure at school is positively related to teen birthrates, and the high school dropout rate captures that failure. Birthrates are predicted to be positively related to the dropout rates because teenage mothers often have low educational expectations and poor school performance. High future aspirations are expected to decrease birthrates and female high school students scoring well on the SAT are likely to have such high aspirations. Higher than average SAT scores are expected to be negatively related to birthrate. The female students with high SAT are likely to have expectations of higher future wages than students who score poorly or do not take the SAT test, expectations that increase the opportunity cost of giving birth. The young women who achieve higher than average SAT scores may also be more capable of making rational decisions about pregnancy prevention and more fully understand the opportunity costs of giving birth.

The third explanatory variable in the educational status category is the percent of adults over 25 who have earned at least a bachelor's degree. I expect this variable to be negatively related to teenage birthrates. Several studies indicated that the level of parental education had a negative effect on birthrates. This effect may be due to the high educational expectations that such parents transmit to their children and parental support for their children's educational efforts. The last variable in this category is the percent of students attending public school. I expect this variable to be negatively related to birthrates. In counties where a higher percentage of students attend public school, there will be more high-achieving students with high educational expectations in public high schools. These students are less likely to become teenage parents.

The variables I selected to proxy for home environment were the percent of female-headed households and working mothers in each county. Some studies have shown teens are more likely to give birth if they were raised in a mother-only household than if they were raised in a two-parent family. The percent of female-headed households is predicted to have a positive relationship to a county's teenage birthrate. The percent of female-headed households may increase teenage birthrates through some unobserved characteristics of single-parent families. According to some researchers teenage women with working mothers are less likely to give birth than teenage women whose mothers did not work. If this is true, the percent of working mothers is negatively related to teenage birthrates. Again, there are a couple of reasons why this relationship is expected to be negative. Working mothers may serve as role models for their teenage daughters, showing them that there are economic opportunities for women in the workforce and thus higher opportunity costs for early childbearing. Another reason could be the unmeasured or unobserved characteristics of the household.

For community environment, I selected variables that would capture some of the differences in urban, rural, and suburban counties. One researcher noted that California’s more rural counties had higher birthrates than more urban counties (Powell, 1994), but social scientists often remark on the higher birthrates in inner cities. I am uncertain as to the effects of the rural population of a county on teenage birthrates. One argument is that more rural counties have fewer of the social problems associated with urban cities. On the other hand, rural counties may lack reproductive health services and job opportunities, conditions that are associated with higher birthrates. I predict that suburban counties will be negatively related to teenage birthrates because of their more stable, better-off households. I am also uncertain of the relationship between birthrates and urban counties. Such counties may provide more reproductive services to teens which I would expect to reduce the birthrate, but the level of poverty and other social ills associated with urban areas may be factors that increase the birthrate.

Finally, I chose two variables to proxy for public inputs: the number of community clinics per women aged 13-44 and the number of physicians per 100,000 county residents. These variables are indicators of health services available in the county, including reproductive health services. Because reproductive health services lower the rate of unintended pregnancy, I would expect both of these variables to be negatively related to teenage birthrates.

These public input variables serve another important purpose. In order to use the Two-Stage Least Squares regression technique, certain conditions must be satisfied (Studenmund, 1997, pp. 551-556). The regression equations for teenage birthrate and high school dropout rate must be "identified" so that each equation can be distinguished from the others. To meet the condition of identification, there must be unique variables in each regression equation. In other words, some of the explanatory variables in the teenage birthrate model must not be included in the high school dropout rate model and vice versa. The public input variables in the teenage birthrate and the high school dropout rate regression models are unique. Thus, the regression equations can be identified because the order condition is satisfied. The order condition is met when the number of exogenous variables excluded from the new equation is greater than or equal to the number of endogenous variables in the original equation. In the teenage birthrate model, there is one endogenous variable, the dropout rate. If there is at least one exogenous variable in the high school dropout rate model that is not included, and therefore is excluded, in the teenage birthrate model, the order condition has been met. The order condition is met because the number of exogenous public input variables in both models exceeds the number of endogenous variables in each model. Since both the identification and the order condition are met, the Two-Stage Least Squares technique may be utilized.

Predictions for high school dropout rate variables

Many of the specific explanatory variables for the high school dropout rate regression model are the same explanatory variables as those in the teenage birthrate regression model. As much of the literature reviewed in Chapter 2 suggests, the two rates are driven by the same broad causal factors. In addition, each dependent variable is an explanatory variable in the other's regression model.

As in the teenage birthrate model, the percent of Hispanic, black, and Asian populations and percent of people who speak Spanish at home are proxies for culture. Based on studies of high school dropout rates, I predict that each of these variables will be positively related to dropout rates. Minority students, and especially non-native English speakers, may be hampered by attending poor performing schools. Also, some percentage of the Hispanic, Asian and Spanish-speaking population is likely to be recent immigrants who are at-risk of school failure due to language barriers.

The variables I chose to proxy for economic status are families in poverty, teens in poverty, and median family income. The first two variables are expected to be positively related to dropout rates. Poverty is associated with school failure in many studies. This relationship could be due to fewer resources in the home, living conditions that are not conducive to learning, or unobserved family characteristics. Median family income is likely to be negatively related to dropout rates. More resources in the home, living conditions that support learning, less parental stress, and unobserved family characteristics are all possible reasons for the negative relationship between median family income and dropout rates.

The educational status variables I chose to include in the dropout rate model are: the percent of female students with higher than average SAT scores (to be consistent with the teenage birthrate model), the percent of adults with bachelor's degrees or higher, and the percent of K-12 students who attend public schools. I predict that both the SAT scores and the percent of adults with bachelor's degrees will be negatively related to high school dropout rates. The above average SAT scores for female students are indicators of the quality of the students and of education available in a county, at least for students with high educational aspirations who take the exam. The percent of adults with bachelor's degrees could affect dropout rates because of the number of college educated adult role models in the county, or because the opportunity costs of dropping out of high school are higher in counties with many educated workers competing for jobs. I am not sure of the nature of the relationship between the percent of students in public, as opposed to private schools, and the dropout rate in the county. But, private schools on average have lower dropout rates than public schools. This would suggest a positive relationship between the public school attendance and the dropout rate.

The percent of female-headed households and working mothers are the specific explanatory variables for the home environment category. A number of researchers have reported that teens in female-headed households are more likely to dropout of high school than teens in two-parent families. This difference may be due to the instability some researchers associate with such families or other family/parental characteristics that cannot be measured. I expect the percent of female-headed households to be positively related to the dropout rate. I am less certain about the effects of the variable working mothers. Its relationship could be a negative one if the role model aspect associated with working mothers decreases the dropout rate. It could be positive if working mothers provide less supervision and support for their children's educational activities.

For the community environment factor, I chose five variables. Rural population, suburban county, and urban county were chosen to represent differences in "lifestyles" in the counties. Agricultural employment and manufacturing employment are measures of the job market in a county. I am uncertain about the effects of rural population, suburban county, and urban county. The larger rural population and suburban nature of some counties could have a negative effect on dropout rates if those counties are less plagued with the problems associated with large urban populations, such as poorly performing schools. With that reasoning, urban counties would be positively related to dropout rates. However, a more rural area may have other qualities that increase the dropout rate, such as fewer job opportunities for more educated workers. To capture some of the effects of the job market on dropout rates, I included the variables for agricultural and manufacturing employment. Because many jobs in these fields generally do not require much education, I predict that both variables will be positively related to high school dropout rates. The more jobs available for people with lower levels of education, the lower the opportunity costs of dropping out of school. These job market variables were not included in the regression model for teenage birthrates for two reasons. One, I have not seen a study that found the types of jobs available in a county to be an important factor in determining teenage birthrates. Two, I believe that in the teenage birthrate model other economic factors sufficiently account for effects of the economic well being of a county.

The public input variables in this model are all specific to education. They include the per pupil expenditures, the pupil to teacher ratio, and the average class size for K-12 classes. In this way I account for not only differences in the amount of money spent on education, but also differences in how that money is spent. Because the per pupil expenditures include salaries, benefits, books, supplies, services, and direct support, some of the effects of how much money is spent will be indirect. Still, I predict that the relationship between expenditures and high school dropout rates will be negative. I also predict that the relationship between dropout rates and both pupil to teacher ratio and average class size will be positive. Each of these variables is a measure of the resources in the classroom where they have the most impact on student's performance. As the pupil to teacher ratio and average class size increase, so too will the rate of high school dropouts.

Just as the pubic input variables in the teenage birthrate regression model helped satisfy the conditions necessary for a Two-Stage Least Squares regression, these public input variables are needed to satisfy the same conditions. The public input variables, along with the variables for agricultural and manufacturing employment, are unique to the dropout rate regression model. The uniqueness of these five variables satisfies the identification condition discussed above. The order condition is satisfied because the number of exogenous variables that are excluded from the teenage birthrate model, five, is greater than the number of endogenous variables, one. The two public input variables in the teenage birthrate model are excluded from the high school dropout rate model, which has one endogenous variable, so the order condition is met for that model as well.

The last variable in the high school dropout rate regression model is the teenage birthrate. The teenage birthrate variable is included because of the suspected endogenous relationship between teenage birthrates and high school dropout rates. I predict that teenage birthrates will be positively related to dropout rates. In this dual-causality relationship, the dropout rate increases as the birthrate increases whether teens drop out of high school because they have a baby or dropout and then have a baby. There is some controversy about which variable has the most impact on the other variable. I hope to shed some light on this issue with my regression results.

Specific variable data

Tables 3.2 and 3.3 provide additional information about the variables in the regression models. Table 3.2 lists the mean, the standard deviation, and the minimum and maximum value of each variable. The mean is a measure of the average value of a variable. The standard deviation is a measure of variation in each variable. Generally, 67 percent of the observations will be within one standard deviation of the mean and 95 percent will be within two standard deviations of the mean. The minimum and maximum give the range of values for each variable, which provides a measure of the variation between counties.

Table 3.3 reports the simple correlation coefficient. This is a measure of the strength and the positive or negative nature of the relationship between two variables (Studenmund, 1997). If two variables are highly correlated, with a coefficient of 0.80 or higher, the variables may be too closely related for the SPSS program to separate their individual effects on the dependent variable. This condition is called multicollinearity. It artificially raises the standard errors of the regression coefficients calculated for the two highly correlated variables and falsely indicates that

Table 3.2: Descriptive Statistics

Variable

Mean

Standard Deviation

Minimum

Maximum

Teenage Birthrate

66.316

24.796

16.00

117.0

Hispanic Population

17.689

12.688

3.30

65.80

Black Population

3.3947

3.5975

0.20

17.40

Asian Population

4.8000

5.2769

0.20

28.40

Spanish Speaking

13.149

10.909

1.90

61.70

Families in Poverty

9.7789

3.8904

3.00

20.80

Teens in Poverty

19.070

6.4417

8.00

36.00

Median Family Income

34876.00

9303.90

3622.00

0.5916E+05

High School Dropout Rate

4.4386

1.8392

1.20

9.800

Female SAT Scores

16.132

6.1728

6.10

38.60

Adults with Bachelor's Degrees

18.683

7.727

9.00

44.00

Public School Attendance

93.2544

3.9814

77.50

99.40

Female-Headed Households

10.102

1.9884

5.70

15.00

Working Mothers

63.425

4.9495

51.90

76.00

Rural Population

34.853

28.781

0.00

100.0

Suburban County

0.38596

0.49115

0.00

1.00

Urban County

0.17544

0.38372

0.00

1.00

Agricultural Employment

7.7088

6.3930

0.60

32.30

Manufacturing Employment

12.1842

4.7798

3.20

31.60

Community Clinics

1.650E-04

2.774E-04

0.00

0.001471

Physicians

167.7193

100.4141

24.00

658.00

Per Pupil Expenditure

4125.28

5122.89

3594.36

6824.29

Pupil to Teacher Ratio

21.5719

2.0584

14.10

24.60

Average Class Size

25.8070

2.8317

14.90

29.00

they have no statistically significant affect on the dependent variable. One way to correct the problem of multicollinearity is to remove one of the highly correlated variables from the regression model. Any changes made in the regression model based on the existence of multicollinearity will be explained and reported in the next chapter.

Once a regression model is specified, meaning that all the variables have been chosen and defined, the data are entered into a statistical analysis computer program such as SPSS and the regression is estimated. I report the results of the regressions of the models for teenage birthrates and high school dropout rates in Chapter 4. Chapter 4 also includes a discussion of the tools with which to evaluate the regression results. In addition to the problem of multicollinearity, I discuss some of the other common problems in regression analysis and explain how they are identified and corrected. Once the corrections are made and the corrected models are estimated, I calculate and report the magnitude of the effects of the variables that affect teenage birthrates.

Table 3.3: Correlation Matrix

  Teenage Birthrate Hispanic Population Black Population Asian Population Spanish Speaking
Hispanic Population

.479

       
Black Population

.169

.164

     
Asian Population

-.014

.161

.639

   
Spanish Speaking

.441

.989

.122

.127

 
Families in Poverty

.740

.398

-.039

-.122

.424

Teens in Poverty

.455

.316

-.102

-.011

.319

Median Family Income

-.498

-.091

.233

.406

-.118

High School Dropout Rate

.379

.179

.279

.309

.146

Female SAT Scores

-.702

-.439

.050

.290

-.427

Adults with Bachelor's

-.630

-.147

.267

.540

-.159

Public School Attendance

.341

.002

-.443

-.677

.009

Female-Headed Households

.684

.617

.484

.381

.592

Working Mothers

-.662

-.259

.064

.250

-.278

Rural Population

-.100

-.401

-.527

-.668

-.353

Suburban County

-.063

.242

-.071

.079

.226

Urban County

.000

.110

.601

.568

.086

Agricultural Employment

.442

.466

-.291

-.304

.488

Manufacturing Employment

.052

.108

.074

.263

.066

Community Clinics

-.203

-.297

-.194

-.296

-.275

Physicians

-.396

-.129

.367

.689

-.134

Per Pupil Expenditure

-.329

-.229

-.077

-.067

-.205

Pupil to Teacher Ratio

.405

.439

.312

.235

.398

Average Class Size

.279

.386

.377

.410

.341

Table 3.3: Correlation Matrix (Continued)

  Families in Poverty Teens in Poverty Median Family Income High School Dropout Rates Female SAT Scores
Hispanic Population          
Black Population          
Asian Population          
Spanish Speaking          
Families in Poverty          
Teens in Poverty

.791

       
Median Family Income

-.679

-.521

     
High School Dropout Rate

.331

.332

-.131

   
Female SAT Scores

-.628

-.394

.609

-.075

 
Adults with Bachelor's

-.662

-.330

.748

-.062

.784

Public School Attendance

.448

.293

-.667

-.100

-.584

Female-Headed Households

.594

.451

-.100

.421

-.381

Working Mothers

-.763

-.508

.594

-.210

.599

Rural Population

.110

-.102

-.489

-.210

-.192

Suburban County

.009

.206

.101

.052

.035

Urban County

-.195

-.142

.424

.284

.230

Agricultural Employment

.587

.394

-.574

-.131

-.569

Manufacturing Employment

-.180

-.175

.346

.077

.125

Community Clinics

.016

-.001

-.242

-.144

.079

Physicians

-.397

-.178

.567

.131

.611

Per Pupil Expenditure

-.094

.025

-.059

.021

.403

Pupil to Teacher Ratio

.058

-.041

.210

.246

-.261

Average Class Size

-.065

-.064

.386

.196

-.089

Table 3.3: Correlation Matrix (Continued)

  Adults with Bachelor's Degrees Public School Attendance Female-Headed Households Working Mothers Rural Population
Hispanic Population          
Black Population          
Asian Population          
Spanish Speaking          
Families in Poverty          
Teens in Poverty          
Median Family Income          
High School Dropout Rate          
Female SAT Scores          
Adults with Bachelor's          
Public School Attendance

-.785

       
Female-Headed Households

-.138

-.081

     
Working Mothers

.708

-.517

-.428

   
Rural Population

-.556

.606

-.577

-.247

 
Suburban County

.233

-.202

.275

.109

-.470

Urban County

.416

-.438

.262

.140

-.508

Agricultural Employment

-.588

.468

.193

-.435

.283

Manufacturing Employment

.169

-.166

.130

.039

-.248

Community Clinics

-.248

.352

-.373

-.184

.501

Physicians

.804

-.860

.040

.486

-.562

Per Pupil Expenditure

.062

.110

-.368

.050

.329

Pupil to Teacher Ratio

.049

-.184

.527

-.078

-.555

Average Class Size

.292

-.442

.548

.086

-.708

Table 3.3: Correlation Matrix (Continued)

  Suburban County Urban County Agricul. Employmt. Manufac. Employmt. Community Clinics
Hispanic Population          
Black Population          
Asian Population          
Spanish Speaking          
Families in Poverty          
Teens in Poverty          
Median Family Income          
High School Dropout Rate          
Female SAT Scores          
Adults with Bachelor's          
Public School Attendance          
Female-Headed Households          
Working Mothers          
Rural Population          
Suburban County          
Urban County

-.366

       
Agricultural Employment

-.008

-.427

     
Manufacturing Employment

-.088

.386

-.219

   
Community Clinics

-.233

-.208

.180

-.068

 
Physicians

.175

.466

-.553

.086

-.210

Per Pupil Expenditure

-.290

.014

-.110

-.079

.638

Pupil to Teacher Ratio

.244

.300

-.174

.310

-.677

Average Class Size

.314

.401

-.295

.378

-.739

Table 3.3: Correlation Matrix (Continued)

  Physicians Per Pupil Expenditure Pupil to Teacher Ratio
Hispanic Population      
Black Population      
Asian Population      
Spanish Speaking      
Families in Poverty      
Teens in Poverty      
Median Family Income      
High School Dropout Rate      
Female SAT Scores      
Adults with Bachelor's      
Public School Attendance      
Female-Headed Households      
Working Mothers      
Rural Population      
Suburban County      
Urban County      
Agricultural Employment      
Manufacturing Employment      
Community Clinics      
Physicians      
Per Pupil Expenditure

.060

   
Pupil to Teacher Ratio

.071

-.718

 
Average Class Size

.278

-.741

.878


Return to Thesis Outline