Wednesday, May 2, 2012


An Interview with Will Fitzhugh: Robot Graders Behaving Badly?

Posted by Michael Shaughnessy EducationViews Senior Columnist; Houston, Texas, on May 1, 2012


Michael F. Shaughnessy

1) Will, I understand that some of these testing corporations are using robots to grade real live human writing. Can you briefly tell us what this is all about?

Mike Winerip reports in the New York Times that “The automated reader developed by the Educational Testing Service, e-Rater, can grade 16,000 essays in 20 seconds, according to David Williamson, a research director for E.T.S.” In Concord, Massachusetts, there was a print shop that had a sign: Good, Fast, Cheap: CHOOSE TWO, the point being you could not have all three.

It seems clear to me that the Deeper Learning Project of the Hewlett Foundation is looking for writing assessment that is fast and cheap. It is hard to beat 16,000 “scores” in 20 seconds. The National Writing Board, which provides a unique assessment for high school student history research papers, takes three hours on each paper to provide a four-page report. I doubt if we could do even one of those assessments in 20 seconds, no matter how much foundation funding we were offered.

2) I guess the first thing that comes to mind is creativity—are these R2D2 graders going to be able to recognize a real creative writing sample?

As you know, my primary interest is in academic expository writing, but when it comes to creative writing, these programs don’t care if you are writing an “Ode to a Grecian Urn” or an Ode to an iPhone. The content is of no interest to the robo-graders. They are programmed only to “worry” about a small circumscribed set of writing skills, and the subject of your composition counts for nothing. You can write a dull composition which amply displays ignorance, and still get a good score from the computers.

3) Let’s move on to grammar—or better grandma. Here are two sentences: Let’s eat, Grandma. Lets eat grandma. In the first sentence, we have an apparently hungry kid ready to gobble some apple pie. In the second sentence, it sounds like the kid is a cannibal, plotting to have a tasty repast. What is the computer going to think?

Or in the old example, “It’s not what you think you are, but what you think, you are.” Again, the computer is not “interested” in the meaning of anything it is scoring. At the New York World’s Fair years ago they showed a computer translator that took English and turned it into Russian and then back into English. One example was “The Spirit is Willing but the Flesh is Weak.” After going into Russian, it came back into English as “The Ghost is Ready, but the Meat is Bad.”

4) I know a bit about how these testing services work. Here is an example of a question: Discuss three books that you have recently read and how they impacted you. Now, how will the computer differentiate between one response listing Harry Potter, Twilight and The Hunger Games, and another response listing Great Expectations, The Three Musketeers and Don Quixote?

By the way, I just read Great Expectations for the second time a month ago, and I was surprised to find how much wonderful stuff there is about friendship in it. I had not noticed that the first time around. Again, the computer doesn’t care whether you are talking about Peter the Great or Peter, Peter, Pumpkin Eater, By removing any interest in knowledge or meaning, they can really speed up the scoring process to a very remarkable level—and save a lot of money, too. That is, after they finish paying out all the $100,000 grants and paying for all the new consultants, professional developers, computers, software, etc.

5) There are these two words, that seem to have been forgotten—quality and quantity. Certainly, the robo-evaluator can count the number of words. However, how can they ascertain the veridicality of the robust statements in the student’s rhetoric?

Back in the day, many courses in college taught us that it was important to try to find truth, to separate appearance from reality, to avoid being misled by the surface of things, but to go deeper to find lasting meaning and value. All of this is of no interest to the computer, or perhaps to those who fund these robo-scoring studies either. Otherwise they could not be satisfied with and even proud of so much speed at so little cost, with so much absence of quality or meaning.

6) Will, you have published, QUALITY high school papers for years. I put QUALITY in capital letters, but I am not sure if the robo-graders would understand the concept. Can you tell us a little about The Concord Review, and the consistent quality that you have recognized?

In the last 25 years, I have found nearly one thousand high school students from 46 states and 38 other countries who were willing to write exemplary–average 6,000-word––history research papers on a huge variety of topics that computers have no interest in. The longest paper was 22,000 words on the Mountain Meadows Massacre, but I have published a 15,000-word paper on the Soviet-Afghan War, a 13,000-word paper on the Needham Question by a student from Hong Kong, and so on. I have found that many HS students, presented with the challenge of the exemplary work of their peers, will rise to the challenge and write longer and more serious papers than I thought possible when I started The Concord Review in 1987. I am happy to send examples to anyone who sends me an email at, and there are a number of them on our website at

7) Will, you and I used to use typewriters, and probably white out when we made a mistake, and often re-typed our high school and college papers. Nowadays, they have these word processors that automatically correct and identify grammar and spelling errors. Yet, high school kids seem to write less and less. What’s going on?

Most teachers in high schools now either do not have, or are unwilling to take, the time to read serious academic papers by students in the high school, so they do not assign them, and with the exception of the few students who seek out the opportunity of The Concord Review, and the ones who must write an Extended Essay for the IB Diploma, students do not do much academic expository writing. Even some elite private schools are cutting back the academic expository writing they ask students to do.

Most writing competitions ask for very short personal stories, or very short pieces on set topics, or what is called “creative nonfiction” which is a form of self-centered diary writing. If nobody is asking students to do writing that requires reading of nonfiction books, or knowledge of history, or which is the length of an IB Extended Essay, then they won’t do it. So they don’t write. Everyone complains about their writing, and students still aren’t asked to write much in high school.

 8) Sadly, the Spell check machine, like the robo graders, probably does not recognize the difference between to, two, and too as well as there and their. Or can they?

I am not sure. It is possible that computers can be programmed to look for the probable correct position of various parts of speech, but again, they have been designed to have no interest in, or understanding of, what the student is writing about. Writing for a computer may yield a “score”of some sort, but as far as meaning goes, the student might as well be talking into a dead phone. After all, coming to understand the meaning of what someone is saying takes time, and that would mess up the path to polishing off 16,000 writing “samples” in 20 seconds, right?

Marc Bousquet of The Chronicle of Higher Education makes the excellent point that: “It’s reasonable to say that the forms of writing successfully scored by machines are already—mechanized forms—writing designed to be mechanically produced by students…”

9) What have I neglected to ask?

It is a great enduring puzzle to me why we when we know that nearly 90% of college professors say their new students are not very well prepared in reading, doing research or academic writing, and yet we continue to give our high school students mostly fiction to read and ask them to do only personal writing, the five-paragraph essay and/or the 500-word “college essay.” In general I feel sure we don’t hate our students and we are not determined to have them fail, but when 47% of Freshmen in the California State College System need to enroll in remedial Reading courses, it is hard to see how we could serve them much worse if we did hate them.


“Teach by Example”
Will Fitzhugh [founder]
The Concord Review [1987]
Ralph Waldo Emerson Prizes [1995]
National Writing Board [1998]
TCR Institute [2002]
730 Boston Post Road, Suite 24
Sudbury, Massachusetts 01776-3371 USA
978-443-0022; 800-331-5007;
Varsity Academics®

No comments:

Post a Comment