Thursday, August 13, 2015

It’s All About the Bell Curve: Sheri Lederman’s Day in Court

I traveled up to Albany this morning to hear the oral arguments in the Lederman v. King case presented to Acting Supreme Court Justice Roger McDonough by Bruce Lederman, and Colleen Galligan representing the State Education Department. This is the first time in my life I have sat in a courtroom proceeding. I don’t even watch Law and Order. Let’s just say I was most definitely not in my element. But I’m a pretty good observer of human behavior, a decent note-taker, and I had personal reasons for caring deeply about the outcome of this case, above and beyond all the reasons we all should care about a case that may have far-reaching implications for the misguided reforms of Race to the Top (see full disclosure below). What I witnessed was a masterful take down of the we-need-objectivity rhetoric that is plaguing education. So I should begin by saying that I am hopeful, because it seems someone with the power to make a difference gets it. Judge McDonough gets that it’s all about the bell curve, and the bell curve is biased and subjective.

In case you need a refresher on how test scoring works these days (and who doesn’t) I suggest you start with the excellent fact sheets from Fair Test, first on norm-referenced tests, or NRTs, and then on criterion-referenced tests, or CRTs, and tests used to measure performance against state standards. In particular note the following important points:

“NRTs are designed to sort and rank students 'on the curve,' not to see if they met a standard or criterion. Therefore, NRTs should not be used to assess whether students have met standards. However, in some states or districts a NRT is used to measure student learning in relation to standards. Specific cut-off scores on the NRT are then chosen (usually by a committee) to separate levels of achievement on the standards. In some cases, a CRT is made using technical procedures developed for NRTs, causing the CRT to sort students in ways that are inappropriate for standards-based decisions.”

As you may notice, we’ve come a long way from getting a 91 out of 100 on a test and knowing that was an A-. Testing today is obtuse and confusing by design. In New York State, we boil it down to a ranking from one to four. That’s right, there’s even jargon for “ones and twos” that is particularly heinous when you learn that politicians have interests in making more than 50% of students fall in those “failing” categories. Today the state released the test score results for students in grades 3-8 and their so-called “proficiency” is reported as below 40% achieving the passing levels. By design the public is meant to read this as miserable failure.

The political narrative of public education failure extends next to the teachers, who must demonstrate student learning based on these faulty tests, even if they don’t teach the subjects tested, and even if they teach students who face hurdles and hardships that have a tremendous impact on their ability to do well on the tests. In Sheri’s case, her rating plunged from 14 out of 20 points to 1 out of 20 points on student growth measures. Yet her students perform exceedingly well on the exams; once you are a “four” you can’t go up to a “four plus” because you’ve hit the ceiling. In fact, one wrong answer could unreasonably mark you as a “three” and you would never know. Similarly, the teacher receives a student growth score that is also based on a comparison to other teachers. When it emerged in the hearing today that the model, also known as VAM, or value-added, pre-determined that 7% of the teachers would be rated “ineffective” Judge McDonough caught on to the injustice that lies at the heart of the bell curve logic: where you rank in the ratings is SUBJECTIVE.

In his affidavits, Professor Aaron Pallas of Teachers College brilliantly explains the many flaws with this misuse of student test scores to evaluate and rank teachers’ effectiveness. Predetermining a set percentage of ineffective teachers regardless of their actual “effectiveness” and their students’ achievements was the first major flaw. The second is that the model is not grounded in scientific definitions of teacher quality or effectiveness, as there are many factors beyond a teacher’s control that contribute to student performance on standardized tests and other measures of their knowledge and skills. Third, the model is not transparent on what “needs to be done to achieve effective or highly effective ratings” which is a requirement of the law. The model also violates the law’s definition of student growth as “change in student achievement for an individual student between two or more points in time.” Judge McDonough seemed to have picked up on this idea, and asked if a better model would test the student at the start and end of a given academic year. Pallas gives a far more nuanced explanation of the need for a different model of testing to measure growth over time, but suffice it to say, the model that produced Sheri’s absurd score is not measuring student growth as defined by the law. Pearson, the corporate entity behind the testing enterprise, even noted, “It is inappropriate to compare scale scores across grades as they neither measure the same content, nor are they on the same scale.” Yet that is what the growth model does.

The lame explanation from Colleen Galligan was that the model may not be perfect but the state tries to compare each student to similar students. The goal, she offered, is to find outliers in the teaching pool who consistently have a pattern of ineffectiveness, to either give them additional training or fire them. At this point Judge McDonough offered her a chance to explain the dramatic drop in Sheri’s score. “On its face it must mean students bombed the test (speaking as one who has bombed tests)” and this produced laughter in the courtroom. For who hasn’t bombed at least one test in their life? Who has not experienced that dread and fear of being labeled a failure? Then Judge McDonough asked rhetorically, “Did they learn nothing?” The only answer she could come up with, was that in this case Dr. Lederman’s students, although admittedly performing well compared to other students, did worse than 98% of students across the state in growth. At this point it was pretty clear to everyone present that this made absolutely no sense whatsoever.
 
Sheri Lederman speaking to reporters outside the courtroom in Albany
Full disclosure:
Sheri Lederman is my high school classmate and she is a highly regarded elementary teacher in the Great Neck Public Schools, which we both attended in our childhoods. She got her doctorate at Hofstra University, where my mother is a professor emerita, and where I know many of the faculty as personal friends. They confirm the high regard I have for Sheri’s intelligence and insights into education. I think she is absolutely heroic to be pursuing a lawsuit, with the expert guidance of her lawyer husband, Bruce Lederman, against the New York State Department of Education, to expose the irrational and illegal practices of evaluating teacher performance using “arbitrary and capricious” student growth models based on flawed science. I have previously written in my blog about Sheri’s hope that her lawsuit would prove to be a “tipping point” in halting the use of these erroneous student growth models. A bit of background on the case from last October can be found here.

On June 1st, the New York State Supreme Court ruled that Sheri’s case could go forward despite the State Education Department’s claim that her lawsuit was baseless since Sheri’s overall evaluation was “effective” despite the “ineffective” label on the student growth portion, worth 20% of the total.

Today’s news was covered so far here, here and here. The local CBS station covered it here and WNYT here.