ChatGPT Architect Berkeley Alum John Shulman’s AI Journey

AI News


John Shulman stands in front of a podium with black text on a white background.

John Shulman discussed recent advances in reinforcement learning and truthfulness at the EECS Colloquium Special Lecture Series on Wednesday, April 19th. (His UC Berkeley photo by Jim Block)

John Schulman co-founded the ambitious software company OpenAI in December 2015, shortly before completing his PhD. He holds a Ph.D. in Electrical Engineering and Computer Science from the University of California, Berkeley. At OpenAI, he leads the reinforcement learning team, Chat GPT — Chatbots based on the company’s generatively trained (GPT) language model — became a global sensation thanks to their ability to generate remarkably human-like responses.

inside Wednesday campus visitBerkeley News spoke with Schulman about why he chose Berkeley for graduate school, the fascination with towel-folding robots, and what he sees as the future of artificial general intelligence.

This interview has been edited for length and clarity.


Berkeley News: You studied physics as an undergraduate at Caltech and originally came to UC Berkeley for your PhD. in neuroscience before switching to machine learning and robotics. Can you talk about your interests and what led you from physics to neuroscience to artificial intelligence?

John Shulman: I was interested in understanding the universe, and physics seemed like a field to study for it, and I admired great physicists like Einstein. But then I did some physics research projects over the summer and realized I wasn’t as excited about them and was more interested in other topics.Neuroscience sounds interesting, and I was a little interested in AI, but I couldn’t see the path I wanted to take. [in AI] As much as neuroscience.

When I came to Berkeley for my neuroscience program and did my lab rotation, I did my final rotation with Peter Abeer. As I was doing my rotation, I was so excited about the job that I felt like I was wasting all my time. Therefore, I asked him to transfer to the EECS (Electrical Engineering and Computer Science) department.

Why did you choose Berkeley for graduate school?

I had a good feeling about it and liked the professor I talked to during my visit.

I also remember going for a run the day after I arrived. I followed that path towards the Berkeley Lab and there was a small herd of deer, including a fawn. It was about 7:30 in the morning, and no one was out. So it was a great moment.

What were some of the early projects in Pieter Abbeel’s lab?

had two main threads [Abbeel’s] Labs — surgical and personal robotics. I don’t remember whose idea it was, but I decided to tie the knot with PR2. [short for personal robot 2]i believe [the project] We wanted to tie knots for suturing, but we didn’t have a surgical robot, so I guess we just thought we’d try out some ideas with the PR2. A robot with wheels, two arms and a head with all sorts of gizmos. It’s still in Peter’s lab, but it’s no longer in use. It looks like an antique.

As a Berkeley graduate student, you became one of the pioneers of artificial intelligence called deep reinforcement learning. It combines deep learning (training complex neural networks on large amounts of data) and reinforcement learning (machines learn by trial and error). error. How did this idea come about?

After doing a few projects in robotics, I was starting to think the methods weren’t robust enough for the particular demo we were trying to make.

Around that time, people were getting some good results using deep learning and vision, and everyone in AI was starting to think about those results and what they meant. Deep learning seemed to be able to build these very robust models by training on large amounts of data. So I started wondering: how do we apply deep learning to robotics? And the conclusion I came to was reinforcement learning.

John Shulman and Peter Abeel pose on stage in front of a projection that reads

Shulman, Ph.D. Advisor Peter Abbeil, professor of electrical engineering and computer science at the University of California, Berkeley, before the EECS colloquium on Wednesday, April 19 (UC Berkeley photo by Jim Block)

You became one of the co-founders of OpenAI in late 2015 and you were still working on your PhD. work at Berkeley. What made you want to join this new venture?

I wanted to do AI research. OpenAI was ambitious in its mission and was already thinking about Artificial General Intelligence (AGI). At the time, talking about AGI seemed crazy, but I thought it made sense to start thinking about it.

What is general artificial intelligence?

Hmm, this is getting confusing. It can be defined as AI that can match or exceed human capabilities in essentially all areas. And seven years ago, the system was so narrow at the time that it was pretty clear what that term was referring to. I think it’s a little unclear because it surpasses human ability in some respects.

In the old days, people talked about the Turing test as a big goal for the field. And now, I think we have quietly passed the point where AI can conduct multi-stage conversations on a human level. But we don’t want to create a model that pretends to be human, so that’s actually no longer the most meaningful goal to aim for.

As I understand it, one of the main innovations behind ChatGPT is a new technique called Human Feedback Reinforcement Learning (RLHF). RLHF helps humans direct AI behavior by evaluating how it responds to various queries. How did you come up with the idea to apply RLHF to ChatGPT?

Well, there’s been a paper on this for a while, but the first version that seemed similar to what we’re doing now was actually the OpenAI paper Deep Reinforcement Learning from Human Preferences “I think that’s how it was. In reality, he’s Paul Christiano, a Berkeley graduate who just joined his team at OpenAI Safety. The OpenAI safety team undertook this effort because the goal was to adapt the model to human preferences. [the models] They actually listen to us and try to do what we want.

That first paper was not about the linguistic domain, but about Atari and simulated robotics tasks. And continued working on using the language model for summarization. By the time his GPT-3 training was over, he saw potential in the overall direction of his research and decided to jump on the bandwagon.

What was your reaction the first time you used ChatGPT? Were you surprised at how well it worked?

I can say that I have seen the model gradually change and gradually improve. One interesting detail is that his GPT-4 training was completed prior to the release of ChatGPT. ChatGPT is based on GPT-3.5, which is a weaker model. At the time, no one at OpenAI was that excited about his ChatGPT. This is because there were much more powerful and smarter models pre-trained. We were beta testing the chat model with a group of maybe 30 of his to 40 of his friends and family.

That’s why I was really surprised that it was so well received by the public.And I think that’s because it was easier to use than the homogenous models we’ve interacted with before. [ChatGPT] It may also have been slightly above threshold in terms of less hallucinations. In addition, I think there is a positive feedback effect of teaching each other how to use it effectively and getting ideas by watching how other people use it.

A photo of John Shulman addressing a packed auditorium.

The audience for Shulman’s EECS colloquium special lecture filled the Banatao Auditorium at Stadjadai Hall. (Photo of UC Berkeley by Jim Block)

The success of ChatGPT has rekindled concerns about the future of AI. Are you concerned about the safety of the GPT model?

I think there are different types of risks that should be distinguished. First, there is the risk of misuse. People can use the model to get new ideas about how to do harm or use it as part of a malicious system. And then there is the risk of a dangerous turn. The AI ​​has some goals that don’t align with ours, and we wait until the AI ​​is strong enough to try and take over.

As for the risk of misuse, it’s certainly not a survival risk, but it’s certainly at the stage of concern. Without taking any measures he could cause a lot of trouble if he released GPT-4 by giving people new ideas on how to do various bad things, and various kinds of scams and spam. I think it could also be used for Some of them have already been confirmed and are not specific to GPT-4.

As for the risk of a takeover or a dangerous turn, I would be very cautious, but the chances of that happening are extremely low. The model itself has no long-term goal, just being trained to. Therefore, there is no reason for the model to have a desire to change anything about the outside world. There are some arguments that this might be dangerous anyway, but I think they’re a bit exaggerated.

Now that ChatGPT has passed the Turing test in many respects, what do you think is the next frontier for artificial general intelligence?

I think AI will continue to get better at harder tasks, and tasks that were previously done by humans will gradually fall off until models can do them perfectly well, perhaps even better. , there is the question of what a human should do. What are the parts of a task that humans can empower and do more with the help of models? I think it will be a step-by-step process to shift

John Shulman stands by a young man taking a selfie with his mobile phone.

The audience took a selfie with Shulman after the talk. (Photo of UC Berkeley by Jim Block)





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *