I traveled up to Albany this morning to hear the oral
arguments in the Lederman v. King case presented to Acting Supreme Court
Justice Roger McDonough by Bruce Lederman, and Colleen Galligan representing
the State Education Department. This is the first time in my life I have sat in
a courtroom proceeding. I don’t even watch Law and Order. Let’s just say I was
most definitely not in my element. But I’m a pretty good observer of human
behavior, a decent note-taker, and I had personal reasons for caring deeply
about the outcome of this case, above and beyond all the reasons we all should
care about a case that may have far-reaching implications for the misguided
reforms of Race to the Top (see full disclosure below). What I witnessed was a
masterful take down of the we-need-objectivity rhetoric that is plaguing
education. So I should begin by saying that I am hopeful, because it seems
someone with the power to make a difference gets it. Judge McDonough gets that
it’s all about the bell curve, and the bell curve is biased and subjective.
In case you need a refresher on how test scoring works these
days (and who doesn’t) I suggest you start with the excellent fact sheets from
Fair Test, first on norm-referenced tests, or NRTs, and then on criterion-referenced tests, or CRTs, and tests used to measure performance against state standards. In particular
note the following important points:
“NRTs are designed to
sort and rank students 'on the curve,' not to see if they met a
standard or criterion. Therefore, NRTs should not be used to assess whether
students have met standards. However, in some states or districts a NRT is used
to measure student learning in relation to standards. Specific cut-off scores
on the NRT are then chosen (usually by a committee) to separate levels of
achievement on the standards. In some cases, a CRT is made using technical
procedures developed for NRTs, causing the CRT to sort students in ways that
are inappropriate for standards-based decisions.”
As you may notice,
we’ve come a long way from getting a 91 out of 100 on a test and knowing that
was an A-. Testing today is obtuse and confusing by design. In New York State, we boil it down to a ranking
from one to four. That’s right, there’s even jargon for “ones and twos” that is
particularly heinous when you learn that politicians have interests in making
more than 50% of students fall in those “failing” categories. Today the state
released the test score results for students in grades 3-8 and their so-called
“proficiency” is reported as below 40% achieving the passing levels. By design
the public is meant to read this as miserable failure.
The political
narrative of public education failure extends next to the teachers, who must
demonstrate student learning based on these faulty tests, even if they don’t
teach the subjects tested, and even if they teach students who face hurdles and
hardships that have a tremendous impact on their ability to do well on the
tests. In Sheri’s case, her rating plunged from 14 out of 20 points to 1 out of
20 points on student growth measures. Yet her students perform exceedingly well
on the exams; once you are a “four” you can’t go up to a “four plus” because
you’ve hit the ceiling. In fact, one wrong answer could unreasonably mark you
as a “three” and you would never know. Similarly, the teacher receives a
student growth score that is also based on a comparison to other teachers. When
it emerged in the hearing today that the model, also known as VAM, or
value-added, pre-determined that 7% of the teachers would be rated
“ineffective” Judge McDonough caught on to the injustice that lies at the heart
of the bell curve logic: where you rank in the ratings is SUBJECTIVE.
In his affidavits, Professor Aaron Pallas of Teachers College brilliantly explains the many
flaws with this misuse of student test scores to evaluate and rank teachers’
effectiveness. Predetermining a set percentage of ineffective teachers
regardless of their actual “effectiveness” and their students’ achievements was
the first major flaw. The second is that the model is not grounded in
scientific definitions of teacher quality or effectiveness, as there are many
factors beyond a teacher’s control that contribute to student performance on
standardized tests and other measures of their knowledge and skills. Third, the
model is not transparent on what “needs to be done to achieve effective or
highly effective ratings” which is a requirement of the law. The model also
violates the law’s definition of student growth as “change in student
achievement for an individual student between two or more points in time.”
Judge McDonough seemed to have picked up on this idea, and asked if a better
model would test the student at the start and end of a given academic year.
Pallas gives a far more nuanced explanation of the need for a different model
of testing to measure growth over time, but suffice it to say, the model that
produced Sheri’s absurd score is not measuring student growth as defined by the
law. Pearson, the corporate entity behind the testing enterprise, even noted,
“It is inappropriate to compare scale scores across grades as they neither
measure the same content, nor are they on the same scale.” Yet that is what the growth model does.
The lame explanation
from Colleen Galligan was that the model may not be perfect but the state tries
to compare each student to similar students. The goal, she offered, is to find
outliers in the teaching pool who consistently have a pattern of
ineffectiveness, to either give them additional training or fire them. At this
point Judge McDonough offered her a chance to explain the dramatic drop in
Sheri’s score. “On its face it must mean students bombed the test (speaking as
one who has bombed tests)” and this produced laughter in the courtroom. For who
hasn’t bombed at least one test in their life? Who has not experienced that
dread and fear of being labeled a failure? Then Judge McDonough asked
rhetorically, “Did they learn nothing?” The only answer she could come up with,
was that in this case Dr. Lederman’s students, although admittedly performing
well compared to other students, did worse than 98% of students across the
state in growth. At this point it was pretty clear to everyone present that
this made absolutely no sense whatsoever.
Full disclosure:
Sheri Lederman is my high school classmate and she is a
highly regarded elementary teacher in the Great Neck Public Schools, which we
both attended in our childhoods. She got her doctorate at Hofstra University,
where my mother is a professor emerita, and where I know many of the faculty as
personal friends. They confirm the high regard I have for Sheri’s intelligence
and insights into education. I think she is absolutely heroic to be pursuing a
lawsuit, with the expert guidance of her lawyer husband, Bruce Lederman,
against the New York State Department of Education, to expose the irrational
and illegal practices of evaluating teacher performance using “arbitrary and
capricious” student growth models based on flawed science. I have previously written in my blog about Sheri’s hope that her lawsuit would prove to be a
“tipping point” in halting the use of these erroneous student growth models. A bit of background on the case from last October can be
found here.
On June 1st, the New York State Supreme Court
ruled that Sheri’s case could go forward despite the State Education
Department’s claim that her lawsuit was baseless since Sheri’s overall
evaluation was “effective” despite the “ineffective” label on the student
growth portion, worth 20% of the total.
Today’s news was covered so far here, here and here. The local CBS station covered it here and WNYT here.