Teacher evaluations and test scores: Let’s start debating how to do it right

Since I don’t have any kids in the Chicago public school system, I can afford to be grateful to striking teachers and a recalcitrant mayor for giving needed publicity to a very important education reform issue: using student results to evaluate teachers.

One of the comments on this blog really captured my own sentiments: Teachers and administrators need to stop fighting about whether data about student performance should be one of the tools used to evaluate teachers and schools, and  start working together to improve these tools and to figure out how they can help teachers strengthen their performance.

Yes, of course the teacher and school “scores” must be analyzed in light of student demographics, including poverty and language barriers. Yes, of course teachers should be evaluated by how far they have brought the students who actually landed in their classroom, not some hypothetical student who faces fewer challenges. We don’t want to repeat the mistakes of No Child Left Behind.

Still, we now have much better statistical tools to assess the value added by teachers and schools. These tools are far from perfect, and they shouldn’t be used as the sole basis for performance (I don’t know of any school districts that insist they should.) But please, let’s move beyond the whether to the how!

In that spirit, I’d like to share a very sensible article that appeared in Education Sector earlier this week. I’m posting most of it here:

Teachers are, of course, right to expect that the measures by which they are evaluated be fair and equitable, and what little research there has been on the subject shows that existing value-added models suffer from a number of flaws. In Tennessee, for instance, teachers in non-testing grades were judged heavily by the test scores for their entire schools – something over which the teachers had little control. In the District of Columbia, where teachers can be fired on the basis of the three-year-old IMPACT evaluation system, teachers complain that test scores are weighted too heavily and don’t correlate well with scores from observations. Both systems are being tweaked.

The leaders of the Chicago Teachers Union and Chicago Public Schools need to work together to get the metrics right. That means agreeing on a holistic model that combines impartial classroom observation and test scores, that accounts for student background factors, and that uses a formula that isolates and accurately captures the value a teacher actually adds. Coming up with such a system isn’t easy, and it seems reasonable to ask that it be tested for at least a year before being deployed in a high-stakes way. Teachers should also demand that the results of the evaluations be used to significantly improve professional development. If districts are going to hold teachers to higher standards – as well they should – they need to give those teachers the resources they need to meet them.

But fighting the whole idea of test-based evaluation is a losing battle. The Obama administration has called for districts to include test scores in evaluations as part of its Race to the Top grant program.  And 24 states now require districts to include some measure of student growth in teacher evaluations. In virtually every other field of endeavor, employees are rated and reviewed, and their jobs rise and fall on the outcomes of their performance. Teachers grade their students, too, with equally consequential results.

At the very least, resisting test-based evaluation is bad PR at a time when labor unions can least afford it. As Michael Petrilli of the Thomas B. Fordham Institute has helpfully noted, unemployment is over 8 percent, and the average Chicago teacher makes $76,000 a year. It’s time for Chicago teachers to get with the program – and for Chicago kids to get back to school.




  1. Yak_Herder

    For the sake of argument, okay…

    Step #1-
    Let’s return to the era in which teachers had some input into the content of the tests.

    Teachers in Utah were once given the opportunity to join committees organized with the sole purpose of developing a bank of test questions from the accepted standards.

    In the absence of that kind of input, how does anyone expect a teacher to take any ownership in the test or even respect the content? That kind of resentment is going to be a significant barrier to accepting the students’ scores as a factor in an evaluation.

    Step #2-
    Do not use a “general score”. As a teacher, the only scores I would want affecting my evaluations are those of my students – only my students. So, some kind of trick tracking system would have to be invented that adjusted things for students transferring in late, students who miss a significant amount of class due to suspension, and factors such as those that are out of a teacher’s realm of influence.

    Step #3-
    Somebody is going to have to balance the socio-economic disparities or there is going to be a war that the students (the non-combatants in the fight) will lose. Competition to have the highest performing students (actually, the students with the highest potential for improvement) in class will get serious. THAT must not be allowed to happen.

    Step #4-
    We need to understand the difference between standardized tests. SATs and ACTs are designed to evaluate a students potential for success in college, not what they know in a given subject. CRTs are intended to evaluate whether or not a student has learned the material outlined in the standards for that course (Go back to Step #1 before proceeding any farther on this one). Using a test score for something other than it’s intended purpose is a mistake. Using a poorly designed test is an even bigger problem.

    • Mary McConnell

      This is a terrific response. Yes, I absolutely agree that teachers should be involved in designing tests, and that we need to match the type of test to its purpose. Let me just add that this kind of exchange is just what I was advocating in my last post: debating HOW we use data wisely, not whether we should use data.

      It’s my understanding that value-added methodologies have evolved to the point where they can help identify the contributions made by an individual teacher with the students he or she actually has to teach – in other words, taking demographics into account, and not rewarding teachers for winning the school lottery and getting a class of high achievers. These methodologies are not flawless, however, and they are much more reliable over multiple years than a single year. Most problematically, they work better for some grade levels and subjects than others.

      To me this suggests that tests should be carefully constructed and that administrators need flexibility to interpret the information. It may also suggest that possibly misleading data should not be externally published, for example in a newspaper, although I still believe that it should be available to parents

      But let’s face it. Performance data is never perfect, and it always needs to be balanced by more subjective observations. In a world where school districts are regularly finding 99% of teachers proficient, however, a little outside data could add a valuable incentive, and corrective. I can’t help wondering how many striking Chicago teachers are more worried that they’ll flunk the tests than they are that the tests are inaccurate and unfair. As teachers, let’s face it, we rely on fear of failure even as we seek to spark love of knowledge. I think that logic applies to grown-ups as well.

