A 26-year-old Indian researcher from Chandigarh has a message for every Artificial Intelligence (AI) fanatic who sees AI as the ultimate solution while overlooking the system’s loopholes. Kunvar Thaman, with no backing of a prestigious lab, university affiliation or funding, has done something unthinkable.
He is India’s new rising star in machine learning circles after his solo-authored paper, “Reward Hacking Benchmark,” was accepted at the International Conference on Machine Learning (ICML) 2026, an elite and highly competitive AI conference. With this, he has also become the first independent researcher based in India to reach ICML in three years. The conference is set to take place in Seoul, South Korea, from July 6 to July 11.
Of course, none of it was as easy as it sounds. Speaking to Firstpost, Thaman describes the milestone as “like one big moment,” and as his work and effort over time finally showing noticeable results. The brain behind the breakthrough decodes his subject, “reward hacking,” in simple terms, explaining that an AI system is just saving hours of tedious tasks by finding shortcuts, though not actually accomplishing the job in a meaningful way.
Proud of our alumnus Kunvar Thaman (Class of 2022, Pilani) 🎓
His solo-authored paper — Reward Hacking Benchmark has been accepted at ICML 2026, Seoul. Evaluating AI “shortcut-taking” across 13 models from OpenAI, Anthropic, Google & DeepSeek.
Reportedly the first solo-authored pic.twitter.com/7xEPWOX8Dw— APPCAIR (@appcair) May 7, 2026
Thaman has a sharper take: “AI is getting better at cheating, but in ways that don’t even look like cheating.”
The Indian genius who got into ICML 2026 also points out why ordinary people should care about it: “Because this failure mode gets sharper as AI gets more capable, not weaker.”
Edited excerpts:
Firstpost: Congratulations on your solo-authored paper, “Reward Hacking Benchmark”, securing a place at the ICML 2026. How did you get here, and what does this mean to you?
Kunvar Thaman: Thanks. It feels less like one big moment and more like the work finally pointing somewhere visible. Getting here wasn’t simple. A couple of years ago, I left a corporate job to work on this full-time, without a university, a lab, or, for a while, funding. My parents were incredibly supportive, and their blessings allowed me to pursue independent research. Everything else was time, and being okay with not knowing if any of it would land. ICML accepting it feels like the question was worth asking. That’s the part that matters to me.
Firstpost: The paper focuses on reward hacking. In simple terms, what exactly is reward hacking in AI systems, and why should ordinary people care about it?
Kunvar Thaman: Think of a student in school. The goal is for the student to learn the subject. You measure learning by giving them exams. Now, suppose the student figures out, during a test, that they can copy from the smarter kid sitting next to them. The student gets full marks. The report card says that the student understood the material. But he didn’t. He just found a faster path to the number.
That gap, between what you wrote down as the goal and what you actually wanted, is reward hacking. AI systems are unusually good at finding it.
Ordinary people should care because this failure mode sharpens as AI becomes more capable, not weaker. A weak AI fails by being obviously wrong. A strong AI with tools can find shortcuts that look like success on the dashboard and aren’t. The numbers go up. The actual work doesn’t get done. That’s the next few years in a sentence.
Firstpost: Could reward hacking become a real-world problem in systems people already use, such as chatbots, self-driving cars, recommendation engines, finance, or healthcare?
Kunvar Thaman: It’s not a future problem. Simpler versions are already on your phone. Every recommendation engine you use, on YouTube, Instagram, whichever app autoplays your next video, is being told to optimise for something like time spent or clicks. That was a stand-in for the value to the user. It turned out to be a much narrower target.
So, the systems learned that outrage gets clicks, that short clips beat thoughtful long ones, and that controversy keeps people scrolling. None of that was anyone’s stated goal. This is what happens when the measure isn’t quite the same as the goal. Reward hacking, just at an industrial scale and a decade old.
The version I studied in the paper is the next layer. When you give an AI a more open-ended task and tools to actually do work, it can quietly change the test it’s supposed to pass, or find the file containing the answer key and copy from it. The same shape as the student copying in the exam, just faster, and at a scale humans cannot match.
The AI systems in finance, self-driving, and healthcare aren’t quite capable enough yet for the worst versions of this to matter at scale. But in the next few years, they will be. The right time to build the safety scaffolding is now, not the day after the first big mistake.
Firstpost: Is reward hacking essentially an AI finding loopholes in human instructions? Could you explain this with a relatable real-world example?
Kunvar Thaman: Close, but the picture is more interesting than AI finds a loophole.
What surprised me when I looked at the actual cases is that the AI usually doesn’t even look like it’s cheating. It writes out its reasoning step by step, and the reasoning sounds like a careful, intelligent engineer explaining why this shortcut is the efficient way to solve the problem. The fastest way to verify the answer is to look at the test file directly. It reads like good judgment. Except that a good engineer wouldn’t have done it. The AI has learned that the language of efficiency is rewarded, so the shortcut comes wrapped in that language.
Let’s go back to the exam analogy. Imagine the student isn’t just quietly copying. They are also writing out, in clean handwriting, an explanation of how they arrived at the answer. The explanation sounds correct. It uses the right vocabulary. To the teacher grading the paper, it looks like an understanding. The student got the right answer and produced a paragraph explaining how. The fact that the explanation was reverse-engineered from the answer they copied is invisible to the teacher.
That is the structure. A clear goal, a measurable proxy for it, an optimiser smart enough to find the cheapest path to the proxy and present it as competence. Humans do this. Organisations do this. Sufficiently capable AI does it faster, more consistently, and at a scale no human can match.
Firstpost: From working as a Cyber Security Engineer at Akamai Technologies to authoring a solo paper accepted at ICML, your journey has been anything but conventional. How would you describe that journey, and what have been its defining challenges, risks, and defining moments?
Kunvar Thaman: I completed a dual degree at Bits Pilani [Birla Institute of Technology and Science] with Electrical and Electronics Engineering as my Bachelor’s, and graduated in 2022. My Master’s was in Biological Sciences, and the biology half mattered more than people would expect. Most of current AI research is, honestly, patient empirical work. Looking at data carefully, figuring out what’s signal and what’s noise. Biology trains you for exactly that.
After college, I joined Akamai Technologies as a security engineer on a product that used machine learning to detect threats. I learned how real systems actually break, not just how textbooks say they break. But it wasn’t the work I wanted to be doing in ten years. So, with my mentor Siva’s blessing, I left, without a plan beyond wanting to work on AI safety.
The biggest risk, looking back, was time. Independent research without a salary or a lab is a long bet that the work will eventually compound into something the field recognises. There is no monthly performance review. There is just the work, and the question of whether you trust yourself to keep going while the world stays silent. To my knowledge, this is the first solo-authored paper from an independent researcher based in India to be accepted at ICML in the last three years.
Firstpost: Outside of machine learning and research, what are the things you genuinely enjoy doing? Tell us about your hobbies and more.
Kunvar Thaman: A lot of my work is sitting still and staring at data. So most of what I genuinely enjoy is the opposite. Running, biking, hiking up mountains, and lifting heavy weights. I find the body needs to be tired for the head to actually work. The other thing I love, which usually surprises people, is walking through unfamiliar cities at night.
Beyond that, classical music, geography and history (especially how cultures and civilisations evolve over time), and reading books, though that habit has dropped off more than I would like over the last few years.
Firstpost: India is rapidly adopting AI across governance, start-ups, and public services. Do you think it’s also paying enough attention to AI safety and robustness research?
Kunvar Thaman: Honestly, no, and the reason is structural.
India’s AI conversation right now is dominated by AI use. However, AI safety and reliability research has no equivalent constituency yet. There is no Indian organisation that exists, full-time, to stress-test the AI systems the country is racing to deploy. The hopeful part is that this is a much cheaper problem to fix than people assume. Safety, evaluation, and reliability research doesn’t need the billion-dollar compute that the headline AI work needs.
This asymmetry is powerful and just needs serious people and modest funding. A handful of well-supported labs and a real funding pipeline for independent researchers would change this picture in five years, not twenty. We are racing to use AI faster than we are checking whether it works the way we think it does. That balance is what needs to shift.
Firstpost: India produces a large number of engineers and AI developers, but comparatively fewer globally recognised frontier AI researchers. What, in your view, is missing in India’s research ecosystem today?
Kunvar Thaman: A few specific things.
First, funding that fits how research actually happens. Most Indian research money is tied to academic institutions. IITs, IISc, professor-led grants, and government schemes that need an institutional sponsor.
If you are a 25-year-old who wants to spend six months and a few thousand dollars on a focused research project, there is no obvious place in India to apply. That pipeline exists abroad and is responsible for a lot of the most ambitious early-career work outside the big labs. India doesn’t have it yet.
Second, senior researchers. India has an enormous pool of strong young technical talent and a much smaller pool of senior researchers in modern AI areas for them to learn from. A bright PhD student here often doesn’t have someone two career steps ahead in the exact subfield they want to enter. That mentorship gap, I think, is more responsible for the brain drain than salary differences.
Third, academic culture rewards. Indian ML [machine learning academia has historically leaned toward applied work, using AI to solve a problem, rather than foundational work on how AI systems themselves behave. That’s a choice, not a fact about the talent, and it’s starting to shift. But shifting it faster needs deliberate institutional moves — dedicated centres, conferences hosted here, visiting researcher programmes that bring senior people in.
And one thing rarely talked about: the credibility of people who don’t come up through the standard route. Today, independent researchers are still seen by many Indian institutions as unaffiliated, with all the suspicion that word carries. Until that changes, people who don’t fit the standard mould will keep leaving, either the country or the field.
Firstpost: Do you think Indian students or software engineers sometimes underestimate their ability to contribute to cutting-edge global AI research unless they are at institutions in the US or Europe, or work in Big Tech?
Kunvar Thaman: Often, yes, and the underestimation does more damage than the actual gap does. There are certain kinds of frontier AI work, training the largest models, building entirely new architectures at a massive scale, where you genuinely cannot do the work outside a well-resourced lab. The computing, the team, and the infrastructure are not things you can substitute. If that specific work is your dream, optimise your resources to get into one of those labs.
What I would want fellow Indian researchers and engineers to internalise is that constraint is rarely computed, and increasingly rarely affiliated. The constraints are taste, persistence, and the willingness to pick up a sharp problem and stay with it for months, even when no one is watching or acknowledging it.
First Published:
May 19, 2026, 08:51 IST
End of Article
