Mike – Artificial intelligence means many different things to many different people, but for me, what it means is about getting computers to do things which at the moment only human beings can do. Getting computers to be able to understand ordinary language like we’re having now, we don’t think of that as being a remarkable capability. We’re not surprised when somebody understands when we talk to them, but actually getting computers that could understand a conversation like this, which we’re just at the point of doing, turned out to be phenomenally, phenomenally difficult.
Chris – What got you into this in the first place?
Mike – So I was of that generation that grew up with the Apollo space programme in the background. I can remember my parents waking me up to see one of the splashdowns from the Apollo spaceships, and that idea of space travel and technology was very, very much in the air. So I was always scientifically minded. I very much wanted to be an astronaut until I discovered just how difficult it was to be an astronaut. But a big moment for me was early 1980, and I grew up in the town of Hereford, a rural town. The main industry was making cider, and the main leisure activity was drinking it. In this rural town, I met with some friends on a Saturday afternoon, and they said, there’s a shop that sells computers down the road. And this seemed completely outlandish to me, because in my head, computers were things that cost millions of pounds. The idea that you’d go into a shop in Hereford and buy one just seemed ridiculous. But we went down to this shop, it’s called a Tandy store, and indeed, there in the window was a computer. And we went into the shop, and we chatted to the guys behind the counter, and they literally said, go ahead. They gave us some programmes to type in. We had no clue what we were doing at the time, and we typed them in. And then over the next couple of weeks and months, I went back to that shop, really a lot. They must have regretted being so generous to me, I’d suspect. But I was sat in the window of that shop, and I learned to programme. And the computer we used, a TRS-80 model one, was very, very crude. But it was the first time that an ordinary family could afford a home computer.
Chris – Did you buy one, though? Did their investment in you, letting you sit there in the shop window, did you buy one?
Mike – I would have loved a TRS-80. We couldn’t afford one. I mean, they were 400 odd pounds at the time. That was actually well out of my pocket money range. But my parents, for a birthday, bought me a Sinclair ZX-80. We bought it secondhand, and it was 70 pounds, and I remember being so excited when I got it home. And much of the following year or so was wasted with me learning how to programme that and writing in computer games from magazines, typing them in, and running them, and debugging them, and so on. I remember it being a very, very happy time that you could, in your head, think up a programme and type it in and then get the machine to do something. You know, it would follow your instructions, and you would create something. And that was really, really great fun. But that set me then on the path for computing, and I knew that that’s what I wanted to do.
Chris – Did you know that you could do computing at university, though? And did you know that that was the logical step?
Mike – Nobody in my family had a degree and only had the dimmest understanding, honestly, of what they were. Both of my parents had left school certainly by 16. But I think in my dad’s case, he left at 14. I’m afraid despite my passion for computing, I wasn’t a very diligent student at school or sixth form, so I didn’t work terribly hard. But I did get a place on a computing degree, and that just confirmed for me that computing was what I wanted to do. You know, this was then the mid 80s, and this is when IBM PCs became a thing, and all of a sudden, businesses, ordinary businesses could start to think about having computers and how they might use them for word processing and spreadsheets. That’s when all that stuff was invented, and so there was a big, one of the big expansions in computing at the time.
Chris – This is obviously in the pre-Internet era. That was 10 years away from mainstream, wasn’t it at that time? But it still existed. There were still networks because I remember my school at that time had, I mean, we used BBC microcomputers, and we had Econet and all that kind of thing, but we still had dial-up modems, and we would dial in to this network and share academic data among schools and universities and things. And there was a little club where we would make pages that would go up, and that was really a prelude. It’s what lit my fire. I mean, were you doing that kind of thing as well?
Mike – Yeah. So, my story is that I did what we now call an internship, but used to be back in the day called an industrial placement for a group called the Joint Network Team, JNT, who were based at Rutherford Appleton Laboratory, and they basically managed the UK branch of what became the internet. It wasn’t called the internet really, at that time. I think it was still called ARPANET, and the UK branch was called JANET, Joint Academic Network, and not every university, I think at that point, in the UK was even connected to it, but virtually nothing outside universities, a couple of government establishments and military establishments were connected. But the Joint Network Team were busy thinking about the future of computer networks. And after that period with them, I absolutely understood that the future of computing was going to be networked, and this was not a big thing at the time, but I really, really understood this is where it’s going. I mean, I can remember getting my first electronic mail message and being so excited about getting an electronic mail message sometime in, it would have been 87. But I say, the point is at the end of that period, I really, really got it that the future of computing was going to be networked.
Chris – It’s a bit like Bill Gates coming back from that conference at Microsoft and saying, it’s all about the internet now. That was a sort of epiphany for you, albeit earlier. But how does that become an interest in AI?
Mike – In my final year, I got to specialise, and I absolutely wanted to specialise in networks. But I also found AI really, really interesting, and in my head, these two things came together. And I realised, look, if the future of computing is going to be networks, then that must be the future of AI as well. It won’t just be a chatbot that’s having a conversation with a human being. It will be AI programmes that are talking to each other. I had that idea as an undergraduate, beginning of 1989 or thereabouts, and that’s what I went and did my PhD in. But it was an uphill struggle to convince people that we were going to build these AI programmes that were called agents, that were going to communicate with one another. People were like, well, why? What’s the point? And we’re like, because it’s inevitable. It’s the future. The idea that you would have these AI programmes that operate on behalf of users, that would be, you would have your agent, Chris, and I would have my agent. We want to arrange a meeting. Why doesn’t it just talk directly to your AI programme? Why don’t the agents just communicate directly with one another? So we had this idea of AI programmes operating on behalf of their users and interacting with each other. And that became this very substantial research field called multi-agent systems, and what’s now called agentic AI.
Chris – I think you’re pointing towards one of the big problems that the industry has, a sort of image problem in the sense that it seems to be omnipotent. AI can do everything. It’s lots of different things, lots of different platforms, lots of different technologies. I’m going to ask you an almost impossible question, which is, can you try and break it down a bit for us so you can explain what these different elements of AI are and why they matter, each of them?
Mike – Sure. When I was an undergraduate in the 80s, the big thing at the time was what’s called symbolic AI. And the question is, suppose you want to build AI that could, I don’t know, translate from French to English, how are you going to do that? Well, the big idea of symbolic AI is what you need to do is you need to model the cognitive processes that a human being doing that task would carry out. In other words, the kind of mental conversation that they would have with themselves, you would explicitly model that. It was called symbolic AI because the things that you were working with were more or less like words in a language. They were symbols. And symbolic AI was good for some things, for mathematics, but it was hopeless at anything that requires perception, vision, or understanding a spoken word, driving a car, riding a bicycle, anything that involves understanding the world around you. It was just really, really hopeless at, and it kind of ground to a halt, and by the late 80s, it was very rapidly going out of fashion. There weren’t, I have to say, big ideas that replaced it at the time, but lurking in the background was an idea that had been around since the 1940s called neural networks. So remember, in symbolic AI, what you’re trying to do is model the mind with neural networks. Instead, what you’re trying to do is kind of model the brain. We can look at a brain under a microscope, and we can see what’s going on there. And what we see is enormous numbers of nerve cells connected to one another in vast networks, and we know that all of human intelligence somehow reduces to the 90-odd billion neurones in a human brain, which are arranged in an enormous network. So the idea of neural networks is, can we do, in software, what those natural neural networks do in wetware, the stuff that’s in our brains? Theoretically, the answer was known a long, long time ago. Yeah, in principle, you can, but people just didn’t know how to really build these things. And one of the really interesting things is all of the key scientific ideas underneath the current generation of AI, which is all based on neural networks, were invented as far back, in fact, as the 1980s. Many of your listeners, and I know you have, will have heard of Geoff Hinton, who’s often described as the godfather of AI. Geoff and his students and his colleague, and there was a large number of people working in this space, they invented all of the key techniques to make neural networks work. What they didn’t have were computers that were powerful enough, and you need training data in order to be able to build these things.
Chris – So that was the bottleneck then, it was the computing grunt that we just didn’t have. We had the aspiration, we had the insight, we didn’t have the technology to power it.
Mike – Yeah, and my honest belief is if Geoff and his colleagues had had the computer power that they now have available, which is, I mean, it’s hard to contemplate just how much computer power is used to build AI programmes at the moment. They’re literally astronomical quantities of computer power. If Geoff and his colleagues had had that at their disposal in the 1980s, I really think the AI revolution that we’ve seen over the last 15 years or so would have happened then. But they didn’t, and Geoff, I think he’s pretty much on record as being still really quite bitter about this. Neural networks were unfashionable. And at the turn of this century, just 25 years ago, I had colleagues that were working in this space who would have their grant proposals rejected, and the reasons for rejection were that people would say, this neural network stuff’s not going anywhere. It’s pseudoscience. It’s not proper science. We’re not going to fund this. And that’s how unpopular neural networks were. It’s really remarkable to think about it now.
Chris – Well, what’s also remarkable, though, don’t you think, Mike, is that you’re saying that we’ve got computers that have extraordinary power and a massive power consumption to run them. And they’re mimicking what’s running in your head and my head, which is using about 20 watts.
Mike – 20 watts. And what that demonstrates to us is that current machine learning algorithms are far too power-intensive, and they require far, far too much training data. A human being learns to read with much less training data than an AI programme requires. How many hours does a typical human driver spend on the road learning how to drive? 20 hours or something. Driverless car companies have spent thousands, almost certainly millions of hours on the road. And we don’t have driverless cars yet.
Chris – So what’s wrong? Is that our model is wrong? Is it that the computers are just not powerful enough yet? Why do you think the problem is?
Mike – No, I think it’s fundamentally that the basic way that we train neural networks at the moment, which is a technique that Geoff pioneered in this space called backpropagation, is just not a very efficient algorithm. If somebody could invent an algorithm to training artificial neural networks, which required literally as little electrical energy as the energy that a human brain requires, that would be a completely transformational moment, but I don’t believe anybody’s seen anything yet that will do that. So that’s how the space got transformed. I mean, and it started to happen around about 2005, and then it became supercharged in 2012, because people realised that if you could use GPUs, graphics processing units, for building neural networks. `you got 10 times as much bang for your buck as you could build neural networks that were 10 times bigger, or you could train them 10 times faster, but you got 10 times more neural network for your money, and that then supercharged the whole space.
Chris – Can you explain for us, when you say that Geoff Hinton’s big insight was this backpropagation approach, how does it actually work? How do these networks work, albeit in a slightly inefficient way at the moment, but what’s the basic premise of their function?
Mike – So the key point in a neural network is that you want to automatically configure the neural network, so that when you present it with an input, and the input maybe is a picture of Chris Smith, and the desired output, the name Chris Smith, you want to automatically adjust the network, so it’s getting closer to the right output for the given input. Geoff and his colleagues came up with the backpropagation technique, which is basically based on calculus. It’s not very advanced calculus. Leibniz would have understood the calculus that’s required for backpropagation, it’s called the chain rule, for those that know about calculus. But basically, it involves working backwards from the output of the network, and trying to automatically adjust the network going back to the inputs. Now, the mathematics is not very sophisticated, but there’s a heck of a lot of it that you have to do. That was the challenge; it’s just that, you know, the amount of computing that you have to do to be able to reconfigure a neural network using that backpropagation process. And you have to do this training process repeatedly, you can’t just show a neural network one picture of Chris Smith labelled with the name, right, so you’ve got an input, a picture of Chris, you’ve got the output, the name of the person that appears there, you have to train the neural network repeatedly with many pictures of you with the same name over again, and actually you have to show it the same picture over and over again. But what you’re doing throughout that process is every time you show it the input and the output, you’re adjusting the network using backpropagation, so it’s getting closer to the desired output for the given input. And that process repeats until what’s called the loss, that is, the errors that it’s making are getting sufficiently small that you’re sufficiently happy with its capabilities.
