Welcome Back: It’s time to slash the SWoRD program
August 18, 2014
There is very little that the entire incoming freshman class has in common with each other, let alone with the rest of the student body. But, at least in the Dietrich School of Arts and Sciences, all students will at least form one type of symbiotic relationship in which all depend on each other.
I learned this when taking classes for my general education requirements — during my second semester at Pitt, I chose to take Cognitive Psychology as one of the three required natural science courses.
At the start of the class, each student was given three options to determine how he or she would be assessed: writing a series of papers, taking a midterm and a final or doing both. I, like most students as I recall, chose the first option.
But the writing option was different from those in other classes — all of our papers would be submitted online to a system called SWoRD (scaffolding writing and rewriting in the discipline), a “web-based reciprocal peer review system.”
A product of the Learning Research and Development Center (LRDC) here at Pitt, SWoRD aims to mimic the journal publishing process in academia by having students both grade and provide feedback on one another’s papers.
In class, we were tasked with writing short papers, modeled after articles one would find in the science section of The New York Times, that reported on a cognitive psychology journal article we read.
That was simple enough, but in the end, my grade on the essay was determined by the scores from my peers, how well my assigned grades matched those given by other students and how helpful my own feedback was on other students’ papers.
I did fairly well on the first assignment but it seemed that the grades I had given other students were, unfortunately, not in line with those that our peers gave.
The accuracy of each student’s grading was, itself, judged on three criteria: the bias (or lack thereof) in the reviewer’s grading, how consistently the reviewer could accurately grade papers and how well the range of scores awarded matched that of the class. Grades were awarded on a scale from one to seven.
To relieve frustration with an otherwise mysterious grading system, students were provided with graphs corresponding to all three components, which explained why the students were awarded their respective accuracy grades.
Wanting to improve my grade, I made adjustments to improve the “accuracy” of my scores — I only scored students in the four to six range. Consequently, I saw an immediate, significant improvement in my accuracy score on the second assignment.
Clearly, using a crude heuristic didn’t reflect an improvement in my grading technique, so why had SWoRD failed to elicit accurate student ratings?
In “The General Theory of Employment, Interest and Money,” economist John Maynard Keynes draws an analogy between the stock market and a fictitious newspaper competition in which readers must choose the six prettiest women out of a selection of 100. The readers whose choices best align with popular opinion are then awarded a prize. This competition was termed a “Keynesian beauty contest.”
Keynes observed that “each competitor has to pick, not those faces which he himself finds prettiest, but those which he thinks likeliest to catch the fancy of other competitors.”
Keynes’ beauty contest is analogous to assigning grades in SWoRD. It is not a matter of grading others’ papers accurately. Rather, it is a matter of correctly predicting and matching the grades that one suspects other students will give.
Proponents of SWoRD might respond by citing research concluding that the program produces “valid and reliable” grading: Valid peer grading is responsive to the “true” quality of the paper, whereas reliable peer grading is consistent over time.
However, there are problems with such reasoning. First, the implementation of simple heuristics by students in response to the accuracy grading scheme can induce statistical reliability. Additionally, the fact that students might not be responding to the paper quality obviously has implications for validity.
According to research conducted by Kwangsu Cho while at the University of Missouri — Columbia and Christian D. Schunn and Roy W. Wilson, of the University of Pittsburgh, the validity of peer grading in SWoRD is justified namely by the observation that “the validity [of aggregate student ratings] appears … at least as high as the validity of single instructor ratings” when compared to scoring by writing experts. But it very well may be the case that both student and instructor grading are invalid. In fact, the researchers begin their paper by discussing several reasons why both instructor and student grading are valid.
Thus, it appears that in order to be effective, SWoRD must address these severe incentive compatibility problems imposed by the accuracy grading scheme. Until then, I would certainly think twice about taking a class utilizing the program.
Write to Thomas at [email protected]