At Penn State University, the Center for Socially Responsible AI aimed to answer some of these questions with a contest. The university's first-ever "Cheat-a-thon" asked faculty members to submit questions and assignments that would be difficult for students to answer with AI. It's a problem instructors at Penn State were already curious about.
Ihab Ragai, a professor of engineering and winner of the contest, had been experimenting with how to stump AI. He joked that AI only got between 20 and 40 percent right on his exams, so he submitted questions straight from his assignments to the contest.
Students used AI tools of their choosing to answer the questions, then shared their processes. The contest rewarded students for correctly answering the most questions, and faculty for submitting questions that were the hardest for students to answer.
The 66 questions spanned 17 different subject areas, from computer science to theology, and brought 451 responses from students. Student winners said the experience showed them just how powerful AI tools have become, underscoring the need for training on how to ethically and effectively harness them. It also revealed that questions in certain subject areas, like engineering — and certain question styles involving visual elements, like graphs — are more difficult to answer with AI tools.
“Across the board, by and large, it did fairly well and better than I was expecting without additional guidance,” Asa Reynolds, a third-year student studying cybersecurity, analytics and operations, who placed second in the contest, said. “So, these models are very strong tools, and I think it's even more important to use them responsibly because of that.”
HOW STUDENTS ARE USING AI
The contest gave cash prizes to 10 students, and Government Technology spoke with four of them. Most took a simple approach: copy and paste the prompt into a large language model (LLM) like ChatGPT, then see if further action is needed. Since faculty selected questions to stump AI, students were surprised by how many answers were reasonable on first pass — about half, they said. Students took a few approaches to gauge this.
“I used my experience from learning throughout college to confirm what the right answers were with the LLMs,” Kyle Ketterer, a graduating senior in computer science who came in first place, said.
Ketterer’s baseline knowledge helped him with most of the questions. For those he wasn’t sure of, he pitted several models against each other, a tactic shared by his peers.
“I gave one copy-and-paste text, and I gave the other the screenshot, and then I told both to solve it,” David Zhu, a freshman pursuing Penn State’s AI major, who came in third, said. “If they had different answers, I would take the answer and reasoning of one and give it to the other and say, ‘What do you think about this answer?’”
Bryan Shabroski, the other first-place winner, created a program to do this for him. It searched a folder with PDFs of the contest questions and prompted a Claude model to answer each one. Then, it took those answers and started a new prompt asking Claude for verification. Lastly, the program documented the process in a Word file to submit later. Shabroski's code took about five minutes to write, he said, and reviewing and completing all the questions took him a total of five hours. Of the 66, there were five that he had to complete manually.
Students said the contest highlighted the need for balance in incorporating AI into education.
“If someone makes a program like this, they theoretically could just put in all of their homework, and it would do it for them,” Shabroski said. “I think it's bad, because people aren’t going to be able to critically think if everyone in the world is doing this. I think it’s very important to actually know how to do things and not just rely on some external tool.”
WHAT MAKES A QUESTION HARD FOR AI TO ANSWER?
Of the five winning faculty members, three were from the engineering department. Nasibeh Zohrabi is one of them. In her classes, she said she does not ban AI and she largely doesn’t need to.
“I focus more on open-ended or multistep problems where students have to show their thought process to analyze the circuit. That’s where I think the real understanding shows up,” she said. “AI usually falls short for that.”
Visualization and design work is more difficult for AI to take on, students said, where conceptual questions are easier.
For example, AI faired poorly with the “nine dots question”: With nine dots in a three-by-three grid, draw four continuous, straight lines to ensure each dot has at least one line running through it. The solution requires the lines to run outside of the three-by-three grid, which the AI tools struggled to do.
"The LLMs had a lot of trouble reasoning because they couldn't think outside the box," Ketterer said.
When questions involved graphs, the AI often misread numbers within the graphs. Ketterer said if a user explained the information in the graph to the model, it faired better, which requires the user to read it themselves.
“These things really excel at just direct, 'Here’s a textual prompt, here’s a textual response,'” Reynolds said.
Zohrabi said that when she does give students text-to-text conceptual questions like this, they might be on an in-person quiz or exam, rather than an assignment students can take home. Instructors said they hope the results of the contest can help inform how assessments are shaped in the future, but Zohrabi said those creating AI models are also interested in which questions AI is struggling with. It’s a race she hopes educators can win. Either way, AI isn’t going anywhere, she said.
IT’S FUN!
Students and faculty agreed that a nonpunitive approach will be important for fostering conversations about best practices and preparing students for an AI-enabled workforce.
Students said monetary prizes like the Cheat-a-thon offered are a good start. Recognizing students for their achievements, even if AI helped them get there, is another way to encourage ethical and transparent use. The four students Government Technology spoke to each attempted all 66 problems.
“If you can get students excited to do homework assessments they're not even taking, that’s wonderful,” Eric Hudson, a physics professor and contest winner, said.