Tell the whole story: Quantitative evaluations lacking in academia

By Rohith Palli / Columnist

Finals week has come. Evaluations are in the air. Perhaps some want their grades to be as objective as possible, to reflect the precise proportion of the course material that they acquired from their semester. On the other hand, many students probably mostly care about maximizing their GPA.

Since the era of Galileo — who measured time by the fall of water — sophisticated methods of measurement and evaluation have been developed. With this increased precision has come a willingness to benchmark. For instance, we can now reliably measure fuel efficiency and more fuel-efficient cars are considered superior to their gas-guzzling counterparts, all other factors held constant.

These measures, however, often do not reflect priorities. In fact, they are all too often over-weighted, considering what they can actually tell us. We default to what we can measure, rather than what we really care about.

Nonetheless, we are now so good at evaluating and quantifying that we often evaluate people in the same way, especially on their performance in tasks such as teaching, learning or retrieving warehouse items. In college, finals probe performance based on course objectives, while standardized tests measure performance against a certain set curriculum.

Even if we think these tests do, in fact, measure mastery of some set of material, we have to ask the question: What is education actually for? If the answer suggests that we should not teach the curriculum, but rather develop the learner or facilitate a broader process of maturation, then perhaps these tests do not reflect the larger priorities of education as a whole.

This is a critical point. A property of an object or process does not become the most important simply because we can measure it. When hidden, unmeasurable factors influence measurable known quantities, we can use large and repeated measurements of these known quantities to predict the influence of the underlying, unmeasurable factors. When it comes to tests, especially standardized tests, we are using a single measurement — rather than a large set — to predict underlying knowledge.

The Atlantic and other news media like to make figures they claim explain the world “in one simple graph.” Rarely, however, can one simple graph accurately paint even a piece of the picture; reality is too complicated. Sometimes processes just aren’t suited for measurement. More often, even the most careful quantitative evaluation misses a key variable. A freshman statistics class might point out that ice cream sales are correlated with homicide rates, but not because ice cream causes homicide.

Early reports of the top 1 percent of the income distribution receiving an outsize share of income gains missed that an even smaller group, the .1 or .01 percent of earners, actually absorbed most of that growth. Numbers can tell us something about the world, but only if treated with caution and extreme skepticism.

The use of GDP per capita as a measure of economic success of a nation is an example of such a measurement. GDP per capita gives the average “productivity” or wealth creation of each member of a nation, but this only reflects how much stuff we make, not how it is divided or how it impacts our quality of life.

The measure doesn’t reflect the status of the environment, security, health, connectedness or happiness — all of which should be higher societal priorities than simply more stuff, since the goal of accumulation of material wealth is traditionally to secure a subset of these things — health, physical security and happiness.

Even measures designed to capture something like the status of the environment often tell only part of the story — because empirical, scientific measures are meant only to tell part of the story. Measuring carbon emissions is useful, but what if a toxic chemical spill takes out your access to water?

On a similar note, surveys that take only quantitative information risk seeing only what they are seeking. For example, a survey that asks about quality of an instructor might see a teacher that is not particularly clear but a lenient grader receive artificially high marks,whereas allowing a free response asking why might elucidate further.

Fortunately, Pitt is generally more clever than the average survey-maker. OMETs and other tools have free-form response space, somewhere for students to provide information in a non-quantitative form.The comments left here can help to contextualize and lend credence to numeric results. Unfortunately, none of this free-form data is made publicly available, so students are left to glean this information by word of mouth or via Ratemyprofessors.com.

Grades function much like the survey questions: The instructor is asked the question “What is this student’s performace in your course: A, B, C, D or F? 0-4?” They then provide an answer to that question based on some sort of evaluation, but this information completely lacks context. Students don’t learn what they need to improve or what they are good at. They merely learn how they compare to the all-important grading scale. Narrative evaluation solves this problem by allowing faculty to describe their experience with students in the same way that students can on the OMET.

This same contextualization is critical in the large-scale study of correlations among real-world variables. When looking hard enough for statistically significant mathematical relationships in the world, we are guaranteed to find them. If you roll a 100-sided die once, there is only a 5 percent chance you get a number of five or lower, but if you do a single roll with 100 six-sided dice, chances are some are going to land on a low number. This doesn’t mean those dice are weighted; this was simply chance. If we keep looking for correlations, we are accidentally going to find some that don’t exist. To avoid this, correlations should only be given full credence when we understand the underlying relationship between the variables and how it informs the numerical correlation between them.

Sometimes, a correlation doesn’t exist. Not everything is connected. Some things can’t be measured.

How do you measure the unmeasurable? You don’t. The key to sorting through the age of “big data” is to realize its limits. Numbers don’t lie, but they never tell the whole story, either.

Write to Rohith at [email protected].