Tests seem so reasonable at first -- teachers teach, students learn, and
demonstrate mastery by passing a test. But as Daniel Koretz says at the start
of his 2008 book, Measuring Up: What Educational Testing Really Tells Us, “Achievement testing is a very complex
enterprise, and as a result, test scores are widely misunderstood and misused.”
Now that is what I call an understatement. Furthermore, despite Common Core
claims that better standards and tests mean fewer reasons for concern about
their misuse, as Vito Perrone of Harvard University pointed out, “Most items on
these various standardized tests remain well within the longstanding technology
of testing, primarily to support the mechanical scoring procedures. They still
seem to be limited instruments with too much influence” (1999, p. 152).
The testing “enterprise” is poised for a warp drive
record-breaker of misuse insanity. In a nutshell, here’s how they plan to
connect the dots.
A tiny fraction of what a student knows and can do is
hypothetically captured, with some modicum of so-called scientific accuracy, by
converting the number of correct answers out of the total number of questions
on a standardized test to a raw score. Keep in mind that this single raw test
score is still prone to error in its intent to measure what the student knows
as the student may have made random choices, guessing correctly (or not), or
may simply have had other contextual reasons for the performance including
illness, distraction, nerves, etc. The test is also imperfect by design and is
likely biased in some ways.
Now that raw score goes through some psychometric process to
either be normed to a scale comparing it to other test scores, and/or it is
ranked somewhere between unacceptable and excellent based on someone’s judgment
of what students should know and be able to do. This is where all hell breaks
loose as that converted score gets used.
How might it get used? For one, to tell the students and the
parents or guardians how “well” they did which can involve labeling the
converted score with a percentile rank, a grade-level equivalent, or just a
descriptive meaning such as “meets standard.” However, it will likely be used
in what is called a “high stakes” way to assign students to special education,
to hold them back a year, or to track them into homogenous groups.
The most pernicious use is to group the scores to make
claims about the quality of individual teachers. From there, it’s easy to see
how tempting it is to make a claim about the quality of a school, and then a
whole district. While we’re at it, let’s compare counties, states, regions,
countries.
The cold hard truth, in Koretz’s words, is this:
Scores on a single
test are now routinely used as if they were a comprehensive summary of what
students know or what schools produce (p. 44-45).
He goes on later to add:
Simply attributing differences in
scores to school quality or, similarly, simply assuming that scores themselves
are sufficient to reveal educational effectiveness, is unrealistic. And more
generally, simple explanations of performance differences are usually naïve.
All of this is established science (p. 142).
Things get really tricky when hierarchical linear modeling
kicks in to provide a “value-added” way to compare actual scores to a
prediction and to use the difference to rate teachers’ effectiveness. Ignoring
warnings from experts, these value-added models or VAMS, have been misused by
policymakers to weigh heavily in the annual evaluation of teachers. Carol
Burris, an outspoken principal who opposed this misuse of standardized test
scores, recently wrote of a teacher’s lawsuit filed in New York State by my
friend, Sheri Lederman, who hopes her case can become “a tipping point” in
bringing this damaging unreliable practice to a grinding halt.
That may be wishful thinking because now the dots are being
connected to the colleges and universities that educate teachers. They too are
to be evaluated and ranked based on the performance of their candidates for
teacher certification on standardized tests, which can be more than four in
some cases. New federal regulations currently open for public comment until February 2nd would
require these institutions of higher education to also track their teacher
graduates, and collect their annual evaluation ratings including the VAMS
measure, in order to be considered eligible for the TEACH grant program. (I
have previously written of how similar perverse incentives plague the new CAEP
accreditation standards for these institutions).
Here’s a test question for Arne Duncan, our Secretary of
Education:
TRUE OR FALSE?
“A program’s ability to train future teachers who produce
positive results in student learning [as measured by standardized testing] is a
clear and important standard of teacher preparation program quality.” (from p. 63 in proposed regulations document)
Here’s a hint, provided by Benjamin Campbell of Richmond,Virginia on the federal register of comments. “Current research indicates that
no more than 14% -- and often far less – of a student’s learning as measured by
standard tests – the only standardized measure – can be attributed to the
teacher.”
The bad news is that Arne Duncan, and a whole slew of
politicians and policymakers in line behind him, think the correct answer to
this question is TRUE. They actually believe harsh punitive consequences work
and lead to improvement. They think closing schools and teacher education
programs is a good idea. They don’t care if any of their plans are based on
faulty data, junk science, or illogical statistics. They blithely ignore extant
research, recommendations from experts, and, to put it bluntly, common sense.
The question remains – what are we going to do about it?
As Captain Jean-Luc Picard would say, “Engage.”
No comments:
Post a Comment