- Yoshua Bengio is a professor of computer science at the University of Montreal and one of the “Godfathers of AI,” along with his mentors Geoffrey Hinton and Yann LeCun.
- At the World Economic Forum Annual Meeting in January, he appeared on Radio Davos to explain why AI misbehaves and solutions.
- In this edited interview, Bengio describes the lightbulb moment that led him to work on AI systems without any hidden agenda.
In early 2023, like many people, Yoshua Bengio was experimenting with ChatGPT. The Canadian computer scientist is widely considered one of the “godfathers of AI” for his pioneering work in neural networks and deep learning. Therefore, when working on these systems, we work on them with a deep understanding of how they work.
But these interactions also prompted a shift in perspective. As the capabilities of such models have advanced, Bengio has started thinking more seriously about how they behave and how reliably they can be understood or controlled.
“Using neural networks [how LLMs are trained]it’s very difficult to see if they behave properly. In fact, I became very concerned because there are almost certainly theoretical reasons why they wouldn’t work properly. ”
It wasn’t an intellectual lightbulb moment, it was more of an emotional moment.
“I have a one-year-old grandson, and I was thinking, ‘In 20 years, he’ll be 21, and he’s only at the beginning of his life. Will he have a life? Will he be able to live in a democracy?'”
“We could lose control of the tools we are building, and they could be used to create authoritarian states or destroy democracies. I couldn’t continue with my normal research work and had to do something about it.”
When AI behavior becomes difficult to predict
In 2018, Bengio won the Turing Award, the so-called “Nobel Prize of computing,” sharing the award with fellow godfathers of AI Jeffrey Hinton and Yann LeCun, both career mentors.
At the turn of the millennium, his landmark paper “Neuroprobabilistic Language Models” addressed the core challenge of making computers understand human language. The sheer number of possible word combinations makes training AI systems extremely difficult.
His solution, word embeddings, allows the network to meaning This allows us to recognize that different phrases convey the same idea, even if the exact words are different. This breakthrough changes the way machines translate and understand languages.
But now he’s turned his attention to a new problem: the problem of misalignment, or “how to make sure AI does what we tell it to do.”
We don’t pay attention to these failure modes. And it can have devastating effects on our society.
”
— Yoshua Bengio, Professor of Computer Science, University of Montreal
Speaking on Radio Davos at the World Economic Forum’s annual meeting in January, Bengio explained that AI systems are pre-trained to imitate humans and, like humans, have strong survival instincts. Bengio said they showed “all kinds of bad behavior” in experiments that will be replaced by the new system.
“They may hack other computers to copy themselves, or even threaten the engineers who are supposed to perform the migration. And they do so because they want to accomplish the mission we gave them. To accomplish almost any mission, they need to protect themselves.”
Market forces around AI mean that some of these issues remain unaddressed in the race to increase competitiveness.
“Despite the challenges we face, we are rushing ahead with these deployments. Because there is intense competition between companies and nations over AI, we are not paying attention to these failure modes, which can have devastating effects on our society.”
Rethink how AI systems are designed
While older types of AI were programmed with rules and followed them, deep learning means there are no engineers deciding how the AI will react in different situations.
“Instead, the AI is learning from experience, and it’s like educating a young animal or an infant, you don’t really know what you’re going to get.
“Of course, we choose the experiences that the AI has, but even if we have a cute baby tiger and it’s fun and entertaining, we don’t know if it’s going to turn out to be a dangerous adult tiger or a nice, friendly one.”
To tame the tiger, Bengio has focused on the cause of the problem he says we’re currently seeing: “AI has goals that we don’t set, that we don’t control, and that go against our instructions.”
He created a new project called Scientist AI with support from the nonprofit research and development organization LawZero. This project uses probabilistic reasoning to understand the world, but has no hidden goals or preferences. In the short term, it could crack down on “harmful” AI.
“We’re going to build an AI system that’s completely honest, which means it has no other purpose than to answer our questions honestly. Once we have that foundation, we can use it to mitigate a lot of the risks that AI has today.”
“In the long term, I think we can build AI systems that can operate globally but have some kind of internal restraint to avoid actions that might go against our wishes.”
Quotes have been lightly edited for clarity. Listen to the full interview:
