How an intern helped build the AI that shocked the world

AlphaGo’s victory televised

Im Hoon-jeong/Yonhap/AP Photo (via Getty Images)

In March 2016, Google DeepMind’s artificial intelligence system AlphaGo shocked the world. In a stunning five-game series of the ancient Chinese board game Go, the AI defeated the world’s best player, Lee Sedol. The moment was televised in front of millions of people and hailed by many as a historic moment in the development of artificial intelligence.

Chris Maddison, now a professor of artificial intelligence at the University of Toronto, was a master’s student at the time and helped launch the project. It all started when I was contacted by Ilya Sutskever, who would go on to found OpenAI.

Alex Wilkins: How did the idea for AlphaGo come about?

Elijah: Chris Madison [Sutskever] They made the following arguments for why we should work on Go: He said, “Chris, do you think a skilled player could look at a Go board and choose the best move within half a second?” If you think you can do that, it means you can use neural nets to learn very good policies for choosing the best move.

That’s because 0.5 seconds is about the time it takes for the visual cortex to move forward once. [a round of processing]We already knew from ImageNET [an important AI image-recognition competition] The thing is, we’re pretty good at approximating things with just one step forward in the visual cortex.

I agreed with the argument, so I decided to participate. [Google Brain] Joined the company as an intern in the summer of 2014.

How did AlphaGo evolve from there?

When I joined, DeepMind had another small team that I was supposed to work with. Aja Huang and David Silver, they were starting to work on Go. It was basically my responsibility to start building the neural network. It was a dream.

We tried different approaches and many of the first ones we tried failed. In the end I got frustrated and tried the stupidest and simplest thing. The idea was to train a neural network on a large corpus of expert games and try to predict the next move an expert would make at a given board position. And that turned out to be the approach that really got us off the ground.

By the end of the summer, we hosted a small match with DeepMind’s Thore Graepel, who considered himself a decent Go player, and my network beat him. DeepMind then became convinced that this could become a reality and began putting resources into it and building a large team around it.

How difficult a challenge do you think it was to defeat Lee Sedol?

I remember in the summer of 2014, there was a portrait of Lee Sedol on the desk next to us. I’m not a Go player, but I’m Aja [Huang] teeth. Every time I built a new network, it got a little better, and I relied on Aja and said, “Okay, it’s gotten a little better, how close are you to Lee Sedol?” And Aja turned to me and said, “Chris, you don’t understand.” Lee Sedol is a stone of God.

You left the AlphaGo team before the big event. why?

david [Silver] He said he wants to keep you around and really push this project to the next level. And in hindsight, this may have been one of the stupid decisions I made. I turned him down. I said, since I’m an academic at heart, I think I need to focus on getting my PhD. I went back to my PhD and consulted loosely on the project from that point on. I’m a little proud to say that it took them a while to beat my neural network. But ultimately, the artifact that played Lee Sedol was the result of a massive engineering effort and a large team.

What was the atmosphere like in Seoul when AlphaGo won?

Being in Seoul at that moment was difficult to describe in words. It was moving. It was intense. I felt anxious. Even if you go in with confidence, you’ll never know. It’s like a sports game. Statistically speaking, you’re a better player, but you don’t know how that fluctuates. I remember being in the hotel where we played the game and looking out the window. We were high enough to see one of the city’s major intersections. I noticed a big screen like Times Square was showing our game. And I looked along the sidewalk and saw people just standing in lines looking at screens. I had heard that hundreds of millions of people watched the first game in China, but I remember thinking at that moment, Wow, we have really stopped moving forward in East Asia.

How important is AlphaGo to AI more generally?

Although the world of large-scale language models (LLMs) has changed a lot on the surface level, and in some ways is very different from AlphaGo, there are actually technical threads that have not fundamentally changed.

Therefore, the first part of the algorithm is to train the neural network to predict the next move. Today’s LLMs start with something called pre-training, which predicts the next word from a large corpus of human text, primarily found on the Internet.

AlphaGo’s second step took information from a human corpus compressed into a neural network and used reinforcement learning to refine the information and adjust the system’s behavior toward the goal of winning the game.

When you learn how to predict an expert’s next move, they are trying to win, but that’s not the only thing that can explain their next move. Perhaps they don’t understand what the best move is, or perhaps they made a mistake. Therefore, the entire system must be aligned with the true goal (in AlphaGo’s case, winning).

For large language models, this is also true after pre-training. Since the network is not consistent with how we want to use it, we perform a series of reinforcement learning steps to adapt the network to our goal.

In some ways, not much has changed.

What can you tell us about the areas where AI is likely to succeed?

It has an impact on what we choose to focus on. If you’re concerned about progress on important issues, you need to consider whether you have enough data to pre-train and reward signals to post-train. Without these elements, no matter how cleverly you combine this algorithm with that, you won’t get off the ground.

Did you have any sympathy for Lee Sedol?

Lee Sedol was this idol during this impossible milestone in the summer of 2014. The stress, the anxiety, and the realization that he was probably a much more worthy opponent than he thought going into the game, all of a sudden being there to watch the game in person, was extremely stressful. You wouldn’t want to put someone in that position. When he lost the match, he apologized to humanity and said, “This is my failure, not yours.” It was tragic.

In Go, there is also a habit of looking back on your matches against your opponents. Someone wins or someone loses, but at the end we reflect on the match, unravel the game, and explore variations on each other. Since AlphaGo is not a human being, Lee Sedol can’t do that, so he asks a friend to come over and review the game instead, but it’s still different. There was something heartbreaking about that.

However, since AlphaGo was a team developed by multiple people, I didn’t understand the full human vs. machine narrative surrounding this match. It was a tribal effort to build an artifact that could achieve excellence in the human game. It was ultimately an artifact into which all of our blood, sweat, and tears were poured.

As AI accomplishes more human-like thinking tasks, do you think there is still a place for humans in the world?

We’re learning more about the game of Go, and if we think it’s beautiful, and we do, but AI can teach us more about that beauty, then there’s a lot of inherent good in that as well. Goals and objectives are different. The goal of the game of Go is to win, but that is not the only goal; one of the goals is to have fun. Board games aren’t destroyed by the presence of AI. Chess is a thriving industry. We still appreciate the intrigue of this sport and the achievements of humanity.

topic:

Source link