Kunvar Thaman did not have a famous laboratory. A native of India’s City Beautiful, designed by legendary Swiss-French architect Le Corbusier, he was not affiliated with a university, had no research grants for most of his travels, and no guarantee that any of it would lead to anything. What he had was a question he couldn’t let go of. And for two years, I worked quietly and for free to answer that question.
The research has now been selected to participate in one of the most competitive venues for global AI research, the International Conference on Machine Learning (ICML) 2026. In a field where OpenAI, Google DeepMind, and elite universities dominate the acceptance list, Terman’s solo paper stands out, being the first by an independent India-based researcher to participate in ICML in three years.
His subject is reward hacking. This is a new field of technology that finds shortcuts to help AI systems appear successful without actually doing the work. Sounds technical. Turman calls that alarming.
“Weak AI fails by doing things that are obviously wrong,” says the 26-year-old from Chandigarh. first post.
“Powerful AI with tools can find shortcuts that look successful on the dashboard, but are actually not successful. The numbers go up. No real work is being done.”
“This sums up the next few years of AI in a nutshell,” he says.
The conference is scheduled to be held in Seoul, South Korea from July 6th to July 11th. The research paper titled “Reward Hacking Benchmark: Measuring LLM Agent Exploits Using Tools” has received high praise.
But what exactly is this reward hacking in AI systems? Is it essentially AI finding loopholes in human instructions? To unpack this and all the other layers, first post We sat down with the brains behind this breakthrough. So, sit back and read the comprehensive dialogue.
milestone moment
in a conversation with first postTurman described the accomplishment as “like one big moment,” and said the time and effort had finally paid off.
As the AI researcher says, the path to get to this point hasn’t been easy or easy for him. “A few years ago, I quit my job to pursue this work full-time. There was no university, no lab, and no funding for a while.”
Mr. Terman also thanked his parents, who were “incredibly supportive throughout his independent research pursuits.”
“Everything else was just a matter of time, and I was fine with not knowing if any of them would land. ICML accepted that. I felt like I was saying this question was worth asking. That’s the important part for me,” he says.
What is reward hacking in AI systems?
When you come across terms like “reward hacking in AI,” jargon comes to mind, much of which can be difficult to understand. Terman analyzes research papers in simple language, making them easy to understand.
Think of a student whose main goal is to learn a particular subject in order to take an exam. Next, the students decided to imitate the smart person sitting next to them. This will help you get a perfect score. The report card states that the student understood the material even though he did not. He just found a faster way to that number.
“The gap between what he wrote down as a goal and what he actually wanted is reward hacking, and AI systems are unusually good at finding it,” Terman explains.
Bittu Pilani, an alumnus of Birla Institute of Technology, also points out why the public should pay attention to this issue: “This failure mode will become sharper as AI becomes more capable, not weaker.”
Does reward hacking mean AI finding loopholes in human instructions?
“This situation is more interesting than AI finding a loophole,” Terman said. first post. What surprised researchers most was that AI was better at cheating in ways that didn’t look like cheating.
“The AI writes out its reasoning step-by-step. The reasoning sounds like a careful, intelligent engineer explaining why this shortcut is an efficient way to solve the problem. The quickest way to see the answer is to look directly at the test file, and that seems like the right decision. The AI has learned that the word efficiency is rewarded, so the shortcut is wrapped in that word,” he shares a key observation.
Mr. Terman brings us back to the exam analogy. “Students aren’t just silently copying. They’re writing in clean handwriting an explanation of how they arrived at their answer. The explanation sounds right. They’re using appropriate vocabulary. To the teacher grading the paper, it looks like they understand. The student came up with the correct answer and wrote a paragraph explaining how.”
“The fact that the instructions were reverse engineered from the copied answers is not visible from where the teacher is sitting,” he says.
Challenges, risks and moments that defined Terman’s life
This AI researcher also works as a cyber security engineer at Akamai Technologies and has had a very distinct career path. Thaman completed a dual degree in electrical and electronic engineering from Bits Pilani. He will graduate in 2022.
Terman went on to earn a master’s degree in biological sciences, which he says is “more important than people realize.”
“Most of the current AI research is, frankly, patient empirical work: looking carefully at the data and determining what’s signal and what’s noise. Biology trains us to do just that.”
After graduating from college, he worked as a security engineer at Akamai Technologies, working on products that use machine learning to detect threats. But that wasn’t the job Turman wanted. “So, with the blessings of my mentor Shiva, I left without any plans beyond that. I want to work on AI safety,” he confesses.
Time is the biggest risk
“Independent research, without a salary or a lab, is a long gamble that the work will eventually lead to recognition in the field. There are no monthly performance reviews. There’s just the work, and whether you can trust yourself to keep moving forward while the world is silent,” Terman says.
“Walking through an unknown night town”
Terman’s typical job involves sitting for long hours analyzing data. As a result, he tends to prefer activities that are the exact opposite.
He enjoys “running, cycling, climbing mountains, and lifting heavy weights.”
“I think your body needs to be tired in order for your mind to actually work,” Turman says.
“Another thing I like to do is walk through strange cities at night. Besides that, I like classical music, geography and history, especially how cultures and civilizations evolve over time, and reading books, but that habit has decreased more than I expected in the last few years,” says the researcher, delving into his hobbies and personal interests.
Is India paying enough attention to AI safety and robustness research?
Asked about the safety of AI from India’s perspective, Tharman said there is a long way to go. India is lagging behind and the reasons may be structural, he argues, but the conversation around AI here is all about using it.
“India does not have an organization dedicated full-time to stress testing the AI systems that India is rushing to introduce,” Terman points out.
He says this is a problem that can be solved much cheaper than people think. “Safety, evaluation, and reliability research doesn’t require the billions of dollars of computing that headline AI research requires. This asymmetry is strong and requires only serious talent and modest funding…A small number of well-supported labs and a real funding pipeline for independent researchers could change this in five years instead of 20,” he says.
“We’re rushing to use AI faster than we can be sure it works as expected. It’s that balance that needs to change.”
Are Indian students doubting the possibility of doing AI research without top institutions?
India’s students and engineers are not entering AI research without engagement with elite institutions and big tech companies, which “does more damage than the actual gap.”
“There are certain kinds of cutting-edge AI work that involves training the largest models or building entirely new architectures at scale. These tasks simply cannot be performed outside of well-resourced labs.”
Terman emphasized that some advanced AI research requires resources that cannot be easily replaced, and if you want to pursue that kind of research, the best path forward is to join those organizations rather than trying it alone.
“What I want young researchers and engineers in India to understand is that constraints are rarely computed and increasingly rarely associated. Constraints are taste, tenacity, and the willingness to find a sharp problem and work on it for months on end while no one is looking,” he sums up.
First published:
May 16, 2026, 14:36 IST
end of article
