*************************
 From the Washington Post, Tuesday, January 27, 2003. See
http://www.washingtonpost.com/wp-dyn/articles/A52280-2004Jan27.html
*************************
What the Media Are Missing
Reports of Average Test Scores Mask Improvements Made by Minorities


By Jay Mathews

Mention Gerald W. Bracey's name in any assemblage of educational pundits and you will often hear an awkward silence. Since his first foray into corrective journalism led to his forced resignation as senior policy analyst at the National Education Association 12 years ago, Bracey has often offended self-appointed experts like me by exposing us to the truth, and he is rarely invited to any of our parties.

This makes Bracey, an associate with the High/Scope Educational Research Foundation and an associate professor of education at George Mason University, testy at times. Some of his e-mails to people he thinks are wrong may use words our mothers told us never to repeat in polite company. But like a stinging cold shower on a languid summer day, he has invigorated the debate over schools. Just look at what he did in the February issue of the American School Board Journal.

His article, "Simpson's Paradox and Other Statistical Mysteries," exposes a great gap in our coverage of test score results. With great regularity, mainstream newspapers like mine, as well as popular magazines and the big networks, report on the lack of improvement in our public schools. We use words like "stagnant" or "sluggish" or "static" or "flat" to describe the achievement levels as measured by the National Assessment of Educational Progress (NAEP), the federal government's most important and most respected measure of U.S. schools. The NAEP (rhymes with "tape") reading scores for students aged 9 gained only four points -- from 208 to 212 -- from 1971 to 1999. Thirteen-year-olds gained only four points and 17-year-olds only three. The change in the average verbal SAT score between 1981 and 2002 is even less impressive. It appears to have gone nowhere. It was 504 in 1981, and 21 years later it was still 504.

Pretty disappointing, huh? But here comes Bracey to explain that we are being deceived by Simpson's Paradox. A statistician named Edward Hugh Simpson came up with this a half century ago. It works on all kinds of phenomena. Bracey defined it for me this way: "Simpson's Paradox occurs when the aggregate group score shows one pattern but subgroups show a different pattern."

When you break down the NAEP and SAT data into ethnic subgroups, for instance, you find that minorities have improved their averages markedly, which is exactly what our increased spending on schools had been designed to achieve. On the NAEP reading test, for instance, non-Hispanic white 17-year-olds had only a small improvement. They went from 291 points to 295 points, while the overall average went from 285 to 288 points. But African Americans in that same period jumped 26 points, from 238 to 264, and Hispanics increased 19 points, from 252 to 271.

The same thing happened with the SAT. Non-Hispanic whites showed a modest increase of 8 points, from 519 in 1981 to 527 in 2002, while African Americans were up 19 points, from 412 to 431, Puerto Rican Americans were up 18 points from 437 to 455 and Mexican Americans up 8 points from 438 to 446. Asian Americans increased 27 points, from 474 to 501.  To the math-challenged among us, this makes no sense. How could almost every ethnic group increase significantly while the overall average went up barely, or not at all?

As Bracey explains, we are overlooking two important factors: (1) minorities make up a much larger portion of the total testing population than they did before, and (2) although they have shown significant improvement, their averages are still relatively low. When you add more low scorers, even if they improve over time, you are not going to see much improvement in the overall average.

Here is my version, call it Mathews's Paradox. If you managed to clone me five times, have short, clumsy me and the five copies of me join one of those recreation basketball teams for geezers, then sent us to enough basketball clinics to raise our scoring average from 2 to 6 points a game per Mathews, that would be a definite improvement for us. But the rest of the team members would undoubtedly be much better, and adding us to the team statistics would likely cause the team scoring average to drop, or at least not get any better.

In the NAEP calculations, from 1971 to 1999 the proportion of non-Hispanic whites in the sample dropped from about 80 to 70 percent, while the African-American percentage increased from 14 to 16 percent and the Hispanic portion from 5 to 10 percent. The same thing happened with the mix of students taking the SAT. The portion of non-Hispanic whites dropped from 85 to 65 percent between 1981 and 2002, while the percentage of African Americans and Hispanics increased from 15 to 31 percent.

You can argue that the failure of the white students to improve significantly is a matter of concern, but it is also clear that we have been obscuring the good news about minority score improvements by focusing so much on lack of change in the aggregate scores.

This, Bracey says, is particularly important now that there is so much focus on minority achievement and subgroup statistics under the No Child Left Behind law. "School board members in districts with changing demographics should be aware of the potential impact of Simpson's Paradox," he said. "It is not just changing ethnicity that can affect totals, however; so can changing socioeconomic makeup of the population. The dot.com debacle has undoubtedly had an impact on some school districts as highly educated unemployed people have moved elsewhere in search of jobs. Since the children of well-educated people tend to score well on tests, the bursting of the dot.com bubble could have resulted in lower test scores that have nothing to do with the quality of instruction in the schools."

The NEA forced Bracey to resign in 1991 because he was being what they called too "entrepreneurial" in his many sharply written critiques of sloppy education journalism, featured in such journals as the Phi Delta Kappan where he is the research columnist. I think this was a mistake by the NEA people. Bracey has proved to be one of this country's most authoritative defenders of the work of public school teachers, and he could have done the association a great deal of good. But the incident nudged him into a career of frequent speeches and articles, and that has been good for the rest of us.

We are, with the new federal No Child Left Behind law, rushing into a new era in which these test numbers will determine how we help our children learn, and how we spend what is now the most money ever spent on public schools. Jerry Bracey may be annoying, telling us how often we are wrong, but I would rather suffer that humiliation now rather than choke on it later when the consequences of our ignorance have become much worse.
********************************************