It's Gonna Be Much Harder Than People Think to Change the Narratives About Testing and Grades
Morgan Polikoff
The hottest essay going around education circles yesterday was this piece in the New York Times by economists Ariel Kalil and Derek Rury, highlighting the growing gap between kids’ grades and their test scores. They rehash some existing arguments about this gap but also present some novel data from their own research showing that parents privilege grades over test scores when making decisions about kids’ performance.
I could tell right away that it struck a nerve, because I was seeing it shared from seemingly every group that I follow. I saw it shared by the “reform” crowd, who have been trying to make the “honesty gap” framing about the kids’ performance post-COVID stick for quite a while now. I saw it shared by the test skeptics who think that the standardized tests on which the essay relies are basically useless—measuring students’ socioeconomic status and nothing else. And I shared it myself with several of my thought-partner colleagues to get their take on the argument and why it was getting so much buzz. It’s the sort of piece that everyone is likely to have a reaction to, based on their own specific lens.
I absolutely agree with some of the central arguments of the piece. I think it’s undeniable that there has been grade inflation up and down the education system, from kindergarten through the PhD. Both NAEP and other tests make clear that, at the same time grades have been soaring, test scores have been stagnant or declining. I think parents plainly are not as worried about kids’ test scores as “experts” are. In fact, I wrote a whole report on the topic of this parent-expert disconnect about two years ago, and have been concerned about since before that (not that these authors cited anything I’ve already said on the topic).
But after a reasonable diagnosis of the problem, the piece falls short in explaining how the problem arose and in proposing reasonable solutions that have any chance of solving them. The recommendations section at the end lacks any useful links, for instance, and offers very little in the way of specific solutions that parents or schools might undertake. These issues with the piece seem like they might reflect a lack of interest in literature from the field of education–a problem of academic silos that is all too familiar.
Test Scores Just Don’t Matter To Many People
While the authors are well known experts on parenting, I’m not sure the finding that parents downweight test scores relative to grades is surprising. For one thing, my own prior research found this–parents aren’t that concerned about COVID-era learning loss because the whole learning loss concept derives from declines in student test scores. Yet parents in our interviews barely ever mentioned test scores when we asked them how their kids were doing. When they did, it was often to question the tests’ validity. These scores simply do not have salience for them.
Now, one can argue that test scores should matter to parents–that these tests measure important things and that they predict important other outcomes that we care about (above and beyond grades and diplomas). But that does not seem to be what most parents currently believe, and so it’s going to be a tall order to change the narrative on these tests. Advocates who think this is an important hill worth dying on should get to work on that–conveying the message that standardized results matter for individual children, and therefore that parents should care about their own child’s results and intervene if those results are not good enough.
And we haven’t even talked about teachers, let alone the teacher educators who prepare them. These groups are probably even more likely to believe that the tests don’t really matter, if not that they’re outright harmful, and to convey that lack of importance (directly or indirectly) to parents and students.
Technical Details of Tests Also Matter
As another example, the commentary demonstrates a rather naive understanding of standardized tests in general, and the role of proficiency standards in particular, perhaps because the authors don’t really seem to engage with any literature in educational measurement. For instance, the essay sets NAEP results out as the most accurate measure of student performance, calling them “actual proficiency rates,” and it casts the decision of a few states to adjust their proficiency thresholds downward as inherently wrong [the implication being that the old, higher standards were “correct” and the new, lower standards are a political move to water down expectations (ignoring the fact that at least one of the states they cited moved some standards up and others down… so which was the right choice in this state?)].
But these claims are quite questionable. For starters, while of course there is some science behind setting performance standards on state tests and NAEP, (see here and chapter 12 here for a couple different summaries of how this is done), the measurement science here is much messier than, say, calibrating a thermometer or a scale. It is not as if there is some accepted universal definition of grade-level proficiency that exists out in the universe and we are comparing all tests against that. There are different methods of setting those standards, which can reach different conclusions, and the question of whether one proficiency standard is better than another one is probably not even answerable. The problems with excessive reliance on proficiency thresholds are also extremely well documented and have been for a quarter century. Beyond that, with regard to NAEP in particular, the truth is that NAEP proficiency standards are quite contested and seem to be set above grade-level proficiency, so it’s not at all clear that that’s the criterion we should compare everything else to.
I do believe that the reporting of standardized tests results is an issue that contributes to parents thinking their kids are doing fine. At both the level of the individual student report card and the school report card, parents receive unclear messages about how kids are doing. They receive these messages too late and in formats that are not helpful. One immediate and practical solution is that states should do better in reporting results at the aggregate level, and they should make specific recommendations to districts for how they should report on both grades and test scores. There are plenty of sets of guidelines and best practices out there that states can draw on.
Solutions Should Follow From Problems
To take a step back, I think we need clarity on what the real problem is here. I would argue that there are at least two specific problems to focus on in this area:
Standardized test scores–that is, the best measures we have of what students know and can do–have declined considerably for over a decade. Beyond that, there are longstanding gaps in scores across demographic groups and there are widening gaps between high and low performing students. These trends matter because standardized tests actually do measure important skills above and beyond what’s captured by grades and diplomas, both at the individual level, and at the societal level. We need folks in the system to believe that these trends matter and to intervene to try to turn them around. To solve this problem we need things like awareness and mindset interventions, better test score reporting, and reforms to our assessment systems, coupled with policies to actually improve the quality of teaching and learning, like curriculum and professional learning.
Grades send unclear–and perhaps inaccurate–messages to parents about how kids are doing. They conflate and probably oversimplify multiple dimensions of student performance. Grading standards appear to have been getting more lax over time, and now we think too many kids are getting higher grades than they “should” based on students’ actual skills and performance. To solve this problem, we probably need to do things like rethinking grading and report card policies, training teachers on new and better approaches to grading, and addressing the root causes that are encouraging poor grading behaviors (such as tremendous social pressures to give children good grades because people believe grades matter so much).
Overall, I would simply conclude that there are lots of complex factors operating here, and while the authors of this piece touched on some of them, the essay fell short in both some of its claims and some of its recommendations. There are many causes of both test score declines and grade inflation. Those causes are not necessarily the same, and they will both need attention if we’re going to change the narrative (and, more importantly, make education better for kids).
The views expressed in this article are solely those of the authors and do not necessarily reflect the views of any affiliated institutions or organizations.

