A testing expert speaks out

I received the following email from Professor Russell T. Warne of Texas A&M — soon to be Professor Warne at Utah Valley University.

“I just read your blog posting about “The (mixed) virtues of teaching to the test. . . . [It offered] a point of view rarely espoused in the public press.

Allow me to introduce myself… I am an adjunct faculty member at Texas A&M University who will be taking a professorship in the Department of Behavioral Science at Utah Valley University this fall. My specialty is standardized testing (in education and psychology) and educational research.”

I wrote back and asked him to tell me more. Here is the first part of his response (there’s more to come.)

“Testing is a broad topic and passions run high among parents, teachers, principals, legislators, and anyone who has ever been to school.

One of the important issues I wish people would understand is the difference between the test itself and what people do with the test score. Having had dozens (if not hundreds) of discussions with teachers about standardized tests, I’ve come to learn that most anti-testers take issue with the stupid ways that scores are sometimes used — not the test itself.

Thus, we come to the crux of NCLB. Often, NCLB-mandated tests are the best designed test that a child will encounter during the entire school year. And most state tests (such as Texas’s TAKS) are extremely good at evaluating most students’ academic achievement. But when that score is used to label campuses as “commended,” “acceptable,” “not acceptable,” or “failing” (or whatever terminology they use in Utah), then things start getting muddy. They get even muddier if we try to tie student performance to teacher pay, promotions, job security, etc. But the test itself is usually extremely well designed.

As far as the NAEP results go, the gains are modest (except for 12th grade, as I’m sure you’re aware of), but as the table on p. 3 of the new report shows, many of the gains since 2006 aren’t outside of the margin of error. I personally believe that the larger gains in low income, Hispanic, and Black students is largely as a result of the policy changes associated with NCLB. Under the law, schools must report these groups’ academic achievement separately and those groups’ progress is used as a criterion on which to judge the schools. This forces school personnel to give more attention to these groups than they have traditionally received. These results are in line with similar NAEP results (and the results of other tests) since roughly 2005.

One thing I find interesting about the report is the item maps (see p. 16, 30, and 44). The number on the item map is a score, and the question description explains what the question corresponds to. The corresponding item is the item which a student at that score level has a 50% chance of answering correctly (after correcting for right answers associated with guessing). Full item maps are available on the NAEP web site (the ones in the report are abbreviated). For example, the average 8th grader in the U.S. obtained a score of 288 on the NAEP U.S. History exam. This means that they have about a 50-50 chance of answering the item labeled “Identify an advantage held by American forces during the American Revolution.” The items lower than that item (which actually is rated 287) the typical student is likely to answer correctly. The higher items are items that the student is unlikely to answer correctly. This ties students scores to specific abilities and not just the vague descriptions of what “basic,” “proficient,” etc., means that are given in the report.”

I asked Professor Warne for further reactions to the NAEP history results and Diane Ravitch’s critique. Stay tuned for the next post.

Leave a comment

DeseretNews.com encourages a civil dialogue among its readers. We welcome your thoughtful comments.