Tech companies are reluctant to promote their expertise in generative AI, a hot new technology that produces text and images in the same way humans do. However, few companies claim the title of “Most Secure AI Company”.
That’s where Anthropic comes in. The San Francisco-based startup was founded by former OpenAI researchers. OpenAI has denounced its increasing commercial focus and has gone independent and created its own company. Anthropic describes itself as an “AI safety” company building “maneuverable” systems, including large-scale language models similar to those underpinning OpenAI’s ChatGPT.
Anthropic’s approach to building safer AI may seem strange. It would have to create a set of moral principles for its own chatbot to follow, which the company has yet to reveal. It works by having an AI model continuously criticize the chatbot’s answers to various questions and ask if those answers are in line with its principles. This kind of self-assessment means that Anthropic’s chatbot, known as Claude, has far less human oversight than his ChatGPT.
Does it really work?
I recently spoke with Anthropic co-founder and chief scientist Jared Kaplan. In his Q&A edited by us, he acknowledged that more powerful AI systems would inevitably lead to greater risks, noting that his company, which describes itself as a “public interest corporation,” has invested $400 million in safety principles. said it would not be damaged. Alphabet Google.
Parmy Olson: Anthropic has a lot to say about creating “maneuverable AI”. Can you explain what that means?
Jared Kaplan: Maneuverable means that the system is useful and you have some control over its behavior. [OpenAI’s] The first GPT models such as GPT-1, GPT-2 and GPT-3 had the feeling that as they got more powerful they didn’t improve in maneuverability. What these original systems are actually trained to do is text autocomplete. This means you have very little control over what you output. Whatever you put in they just keep going. You can’t be sure that your questions will be answered or that you can honestly provide helpful information.
PO: Is the crux of the matter that tools like ChatGPT are designed to be trusted?
JK: That’s part of it. Another is that in these original systems there is really no means of manipulating them other than asking them to complete parts of the text. So you can’t say, “Follow these instructions, don’t write anything toxic.” It has no real handle. Recent systems have made some improvements in that they can follow instructions and be trained to be more honest and less harmful.
PO: I often hear from tech companies that AI systems work in black boxes and are very difficult to “pilot” by understanding why they make decisions. Think it’s exaggerated?
JK: I don’t think it’s an exaggeration. I think we now have, to some extent, the ability to train our systems to be more useful, honest, and harmless, but our understanding of these systems lags behind the power they hold. I am taking
PO: Can you describe a technique for making AI more secure, known as constructive AI?
JK: It’s like Isaac Asimov’s law of robotics. The idea is to give the AI a short list of principles, let it compile its own responses, and steer it to adhere to those principles. There are two ways to do that. One is to have the AI answer the question and then ask, “Did your response follow this principle?” If not, please correct your answer. ” Then train it to mimic the improved revision.
Another way is to let the AI go through a branching point. It answers the question in two different ways. Then ask them to steer towards the type of response that is better. It then automatically assesses whether its responses are consistent with its principles and slowly trains itself to become better.
PO: What is the reason for training the AI this way?
JK: One of the reasons is that humans don’t have to “red team” the model and engage in harmful content. This means that these principles can be made very transparent and society can discuss these principles. This also means you can iterate more quickly.If you want to change [AI’s] Action, we can change our principles. We rely on AI to determine if it follows its principles.
PO: Some people who hear this strategy will think, “It definitely doesn’t sound right for an AI to monitor itself morally.”
JK: There are various risks. For example, there may be some flaw in determining whether the AI is working well. A way to assess whether constructive AI is working is to eventually have a human interact with different versions of AI and tell you which one is better. So people get involved, but not on a large scale.
PO: OpenAI has people working abroad as contractors to do the work. You too?
JK: We have a small number of crowd workers evaluating models.
PO: So what are the principles governing your AI?
JK : We’ll talk about that in a moment, but they’re gathered from a variety of sources, and they’re a mix of sources, from terms of service commonly used by tech companies to the United Nations Human Rights Charter. .
PO: Claude is the answer to ChatGPT. Who is it aimed at and when might it be released more broadly?
JK: Claude is available to individuals on Quora’s Poe app and Slack. It is intended to help people with a wide range of use cases. We tried to make it conversational and creative, but it’s also credible and maneuverable.
PO: What do you think about big companies like Google, Microsoft Corp., Facebook and even Snap Inc. now rushing to bring these sophisticated chatbots to the masses? Sound smart?
JK: I think the cat is out of the bag. We definitely want Claude to be widely available, but we also want it to be the safest, most honest and most trusted model. We are cautious and want to learn as access expands.
PO: There are many ways to jailbreak ChatGPT. How big of an issue is chatbot jailbreaking?
JK : All of these models can be jailbroken. We’ve worked hard to make Claude’s jailbreak difficult, but it’s possible. The scary thing is that AI keeps getting better and better. In the next year or two, we expect to be able to develop smarter models than we have today. It can be quite problematic.
AI technology is dual-use. It’s really useful, but it can also be easily misused. If these models continue to be easily jailbroken and accessible to most people in the world, many problems arise. It may help hackers, terrorists, etc. It may seem like a fun activity right now. “Oh, you can trick ChatGPT or Claude into doing things they shouldn’t be doing.” But as AI continues to advance, the risks become even greater.
PO: How much does Google’s $400 million investment affect the Anthropic principles of AI safety, given Google’s commercial goals?
JK: Google believes Anthropic is doing a good job on AI and AI safety. This investment does not affect Anthropic’s priorities. We are advancing our AI alignment research and continuing to develop and deploy Claude. We will continue to focus on safety.
Parmy Olson is a columnist for Bloomberg Opinion covering technology. He is a former reporter for The Wall He Street He Journal and Forbes, and author of “We Are Anonymous.”