The Educational Testing Service, which designs and grades the GMAT and other widely used standardized tests, said its e-rater program comes within one point of a human grader 98 percent of the time, using the six-point scale that is now a common approach to grading essays on standardized tests.
If there is a difference of more than one point between the scores of the computer and a human evaluator, the essay is read by another person and the three scores are averaged.
ETS, which began using e-rater to grade the test two years ago, has cut its GMAT costs by U.S. $1.7 million a year because graders now have to read fewer essays. The organization can also return scores to test takers in ten days, instead of the four weeks it used to take.
But Sam Graziano, who took the GMAT last month, wasn't thrilled to learn that a computer would evaluate his writing, and thereby help decide whether he is admitted to a top business school.
"I'm a computer science major, and it's kind of hard for me to understand an algorithm that could grade an essay," said Graziano. "At this time, I wouldn't really trust it."
Another essay-grading program, called IntelliMetric, is muscling its way into the standardized testing industry. And Accuplacer is a new program that decides the appropriate course level for incoming college students.
The programs take different approaches to their task. But they all use a database of essays that have been graded by humans. The programs are smart enough, according to their inventors, to recognize what characteristics correspond to higher scores.
ETS's e-rater focuses mostly on how an essay is written, not its meaning. For example, it looks for cue wordssuch as "however," "because," and "therefore"that are key to framing an argument. It also looks for variety in the arrangement of phrases, clauses, and sentences. And to recognize whether an essay is on topic, it looks for certain words based on the previously graded essays in its database.
The Intelligent Essay Assessor is geared more toward the content of a composition. The program is primed by feeding it a batch of essays already graded by humans, or text that serves as the basis for the essays, such as a history or science book.
The program analyzes the relationships between the words, looking for patterns. It recognizes how the words fit togetherfor example, it recognizes that "the doctor operated on the patient" is similar to "the surgeon wielded the scalpel." In that way, its creators say, the Intelligent Essay Assessor comes to understand the words. It can then compare that meaning with the essays to be graded.
"It isn't as simple as looking at which words occur together," said Thomas Landauer, a University of Colorado professor who has done research on the technology. "It's a much deeper process than that."
The Intelligent Essay Assessor, Landauer said, is best at evaluating answers in fact-filled subjects, such as science and history. The program can look at a student's essay and decide what points are missing.
A study that compared essays written under the program's tutelage with those written without such help concluded that the computer-aided essays were consistently better.
The programs do have their limits. They can't deal with creativity, such as metaphors or unconventional writing styles. If confronted by quirks, the computer is supposed to alert its handlers that the essay is unusual and needs to be read by a human.
The e-rater also can be fooled. For example, if the word "therefore" is one of the words it's looking for, it will probably give the writer credit for using it even if it's the first word in the essay, said Marisa Farnum, a writing assessment specialist at ETS. A teacher, on the other hand, might consider such a use of "therefore" completely inappropriate and penalize the student for it.
Some professors, such as William Dowling at Rutgers, think the programs will be unable to process students' more complex and original writing. Dennis Baron, the head of the English department at the University of Illinois in Urbana, has the opposite fear: It won't be able to get past a student's weaknesses.
"I've been reading student writing for 35 years, and you just get a feel for what the student is trying to say," Baron said. "They don't always hit the nail right on the head, and it's those timeswhen you know they're on the right track, they've almost got it, and they haven't quite said itthat you want to give them some credit for this. I don't know that you can program a machine to do that kind of gray area."
Copyright 2001, The Record (Hackensack, N.J.)