IE 11 Not Supported

For optimal browsing, we recommend Chrome, Firefox or Safari browsers.

Can Artificial Intelligence Help Mitigate Grading Bias?

The ed-tech platform Copyleaks has developed an AI-assisted tool to eliminate human bias and discrepancies in the grading process, aiming to provide consistency in grading while helping teachers save time.

Much of the recent focus on artificial intelligence-driven ed-tech tools has been on ChatGPT and keeping students honest, but could AI keep graders on track too? The ed-tech platform Copyleaks is betting so, touting a new AI-assisted tool to help educators grade exams and address concerns about how human biases can affect the grading process.

According to CEO and co-founder Alon Yamin, Copyleaks’ anti-plagiarism and grading platform was among the first to develop and launch AI-detection functions this year to catch students using generative AI to complete assignments, given widespread concerns about AI's potential to enable academic dishonesty. However, he said, the company recently started noticing the need for additional AI-driven functions to address discrepancies in the grading process.

“I think generally once humans are in the process, you will always have bias. Think about a human grader grading thousands of assignments. The solution we provided is focused on high volumes of exams … Naturally, humans are affected by external factors, and there could be a day where I am waking up a bit tired or angry, and it can affect the way I am grading the exams,” he said. “That led us to think about how we can solve this bias, as well as how we can automate this whole process away where we’re talking about thousands, tens of thousands and sometimes millions of exams."

According to Yamin, Copyleaks users scan and grade about 10 million documents a month, and most of the company's clientele is in the U.S. education market. Kinsey Rawe, senior vice president and general manager of Courseware and Instructional Services for Imagine Learning, noted that tools like Copyleaks' new AI Grader can lighten the workload of educators tasked with grading scores of assignments and exams and providing feedback.

“Copyleaks gives us an opportunity to help teachers save precious time that they can utilize to provide more personalized instruction to students while helping them understand the importance of academic integrity,” Rawe wrote in an email to Government Technology.

According to a 2020 research article in EducationNext by David Quinn, an associate professor of education at the University of Southern California, his own experiment found that teachers were about 4.7 percent more likely to rate writing from white students to be at or above grade level when compared to “identical writing from a Black child.” What’s more, Quinn wrote, anti-bias training has generally proven less than promising when it comes to addressing and reducing such grading biases.

At the same time, concerns have been raised about algorithmic bias in AI and its impact on higher education, such as with enrollment-management algorithms that some say have filtered out low-income, female and non-white applicants who are more likely to need larger aid packages. And there are examples of past mishaps involving AI-driven grading: In August 2020, the BBC reported that an algorithm used to grade A-level exams had to be scrapped after it gave far lower scores than expected.

But because one or a handful of exams can alter a student's entire future, Copyleaks Chief Operating Officer Shouvik Paul said some education systems have sought to address grading biases by hiring more than one grader for each exam, which he called "very time-consuming and extremely expensive" for institutions tasked with grading large amounts of exams.

“You’re really hoping that at the end of the day, they’re giving the attention, time and diligence required to truly put that effort into grading it accurately without bias and with the same level of consistency,” he said. “The way other countries tackle that is they’ll hire two, sometimes three individuals to grade the same paper, and they take a sort of median score. That gets really complicated."

Paul said that for big tests administered to large numbers of students, there is often a 6 percent difference between what human graders give to the same work, according to data from the company noting inconsistencies from grading bias. He said that in A/B testing, results were within a 2 percent margin when using the AI Grader, which showed that the tool is effective in mitigating grading bias.

“With us the accuracy is extremely high,” he said. “We tend to be in that 1 to 2 percent sort of delta between what the humans are grading and what the AI gave it.”
Brandon Paykamian is a staff writer for Government Technology. He has a bachelor's degree in journalism from East Tennessee State University and years of experience as a multimedia reporter, mainly focusing on public education and higher ed.