MIT CSAIL researchers discuss the frontiers of generative AI. MIT News

The emergence of generative artificial intelligence has sparked a deep philosophical inquiry into the nature of consciousness, creativity, and copyright. As we witness new advances in this field, it becomes increasingly apparent that these synthetic agents have an astonishing ability to create, iterate, and challenge conventional notions of intelligence. , what does it really mean to say that AI systems are “generative,” blurring the lines of creative expression between humans and machines?

For those who feel as if “generative artificial intelligence” (a type of AI that can create new, original data or content similar to what it was trained on) became an overnight sensation, the new feature is actually It surprises many people. , the underlying technology has been created for some time.

But understanding true capacity can be as obscure as some of the generated content these models produce. To that end, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) met to discuss the capabilities and limitations of generative AI and its potential social and industrial implications for language, images, and code.

There are different models of generative AI, each with their own approach and method. These include generative adversarial networks (GANs), variational autoencoders (VAEs), and diffusion models. All of these are prodigious in a variety of industries and fields, from art to music to medicine. Along with that come many ethical and social conundrums, such as the potential to generate fake news, deepfakes, and misinformation. These considerations are critical to continuing research into the capabilities and limitations of generative AI and ensuring ethical use and accountability, researchers say.

In her opening remarks and to illustrate the visual prowess of these models, Daniela Rus, MIT Professor of Electrical Engineering and Computer Science (EECS) and Director of CSAIL, said that her student recently I pulled out the special gift I gave to Rus, running a spectrum of specular reflections. However, the commissioned artist was not visible.

The machine was to appreciate.

A generative model learns how to create images by downloading many photos from the internet and trying to make the output images look like sample training data. There are many ways to train a neural network generator, but diffusion models are just one common method. These models are described by his MIT associate professor at EECS and his principal investigator at CSAIL, Phillip Isola, and are mapped from random noise to images. Using a process called diffusion, the model converts structured objects such as images into random noise. This process is reversed by training a neural network to remove the noise step by step until a noise-free image is obtained. If you’ve ever used DALL-E 2, where sentences and random noise are input and the noise clumps into the image, you’ve used the diffusion model.

“For me, the most thrilling aspect of generative data isn’t its ability to create photorealistic images, but its unprecedented level of control. It opens up exciting possibilities: the language has emerged as a particularly powerful interface for image generation, allowing you to enter a description such as “Van Gogh style” and have the model create an image that matches that description. “I’m sorry,” says Isola. But language is not all-encompassing. Some things are difficult to convey with words alone. For example, it can be difficult to tell the exact location of a mountain in the background of a portrait. In such cases, alternative techniques such as sketching can be used to provide more specific inputs to the model to achieve the desired output. ”

Isola then used images of birds to show that the different factors that control different aspects of computer-generated images are like “throwing the dice.” By changing these elements, such as the bird’s color and shape, the computer can generate many variations of the image.

Also, if you’ve never used an image generator, chances are you’ve used a similar model for your text. Jacob Andreas, his MIT assistant professor at EECS and principal investigator at CSAIL, draws audiences into a world of image-generated words, writing poetry, having conversations, and performing targeted generation of specific documents, all Acknowledged the impressive nature of models that can run concurrently. time.

How do these models appear to represent what appear to be desires and beliefs? Words are assigned numbers (vectors) and placed in a space of various dimensions. Plotting these values shows that words with similar meanings are close to each other in this space. The closeness of these values indicates how closely related the meanings of the words are. (For example, perhaps “Romeo” is usually closer to “Juliet”). In particular, the Transformer model uses what is called an ‘attention mechanism’ that selectively focuses on specific parts of the input sequence, allowing multiple rounds of dynamic interaction between different elements. This iterative process can be likened to a series of “wiggles” or fluctuations between various points, leading to the predicted next word in the sequence.

“Imagine you were using a text editor and there was a magic button in the top right corner that you could press to convert your sentences into beautiful and accurate English. I’ve been checking grammar and spelling for a while But now we can explore many other ways to incorporate these magical features into our apps,” says Andreas. “For example, you can shorten long sentences and make the words appear the way you want them to appear, similar to shrinking images in an image editor. You can even push the boundaries further by being able to do this, but you have to remember that even today’s best models are far from doing this in a reliable or reliable way. However, the possibilities that can be explored and created with this technology are endless.”

Another feat of large-scale language models that sometimes feels very “meta” was also explored. A model that writes code — like a little magic wand that calls lines of code instead of spells and brings (some) software. A developer’s dream come true. Armando Solar-Lezama, his EECS professor at MIT and principal investigator at CSAIL, recalls the history of 2014, when significant advances were made in the use of a language translation technique called long-term short-term memory (LSTM). is explained. It can be used to fix predictable textual programming assignments with well-defined tasks. Two years after that, a basic human need that everyone likes emerged. He introduced the mechanism in his 2017 Google paper, “Attention is All You Need.” Shortly thereafter, his former CSAILer Rishabh Singh became part of a team that turned their attention to creating whole programs for relatively simple tasks in an automated way. Transformers came along soon after, and there was a surge of research into generating code using text-to-text mappings.

“It’s very powerful because you can run the code, test it, and analyze it for vulnerabilities. However, the code is also very fragile and small errors can have a big impact on its functionality and security,” says Solar-Lezema. says Mr. “Another challenge is the size and complexity of commercial software, which can be difficult to handle even in the largest models. Additionally, the diversity of coding styles and libraries used by different companies makes it difficult to work with code. It means that the accuracy hurdles in doing so can be very high.”

In the question-and-answer-based discussion that followed, Rus opened with one on content. How can the output of generative AI be made more powerful by incorporating domain-specific knowledge and constraints into the model? Models for processing still rely heavily on domain knowledge to function efficiently,” says Isola. “These models incorporate projection and optics equations into objective functions and optimization routines. We cannot predict the future, but as we move forward, we may need less structured data. It is an important aspect of

The panel also discussed the important nature of evaluating the effectiveness of generated content. Many benchmarks have been built to show that models can achieve human-level accuracy in certain tests and tasks that require advanced linguistic proficiency. However, upon closer inspection, simply rephrasing the example can completely fail the model. Identifying failure modes has become as important, if not more important, than training the model itself.

Solar-Lezama acknowledged that the setting for the conversation was academia, and spoke of progress in developing large-scale language models for the deep and powerful pockets of industry. Academia’s model “requires very large computers,” he says, to create desirable technology that doesn’t rely too heavily on industry support.

Beyond technological capabilities, limitations, and how it all evolves, Rus has a moral stake in living in an AI-generated world in relation to deepfakes, misinformation, and prejudice. also raised. Isola mentioned a new technical solution focused on digital watermarking. Digital watermarks help users subtly determine whether an image or text is machine-generated. “One thing to note here is that this is a problem that cannot be solved by technical solutions alone. You can, but it’s very important that the general public is aware of what these models can actually do,” says Solar-Lezama. “At the end of the day, this has to be a broader conversation.

Another trend was discussed regarding chatbots, robots, and the tropes of choice in many dystopian pop culture settings. It is the temptation of personification. Why is it that for many people there is a natural tendency to project human-like qualities onto non-human beings? I explained conflicting ideas.

“Some think that models like ChatGPT have already achieved human-level intelligence and may even have consciousness, but the reality is that these models have not only nuances, but sometimes We still lack a true human-like ability to understand extreme behavior,” says Andreas. On the other hand, some argue that these models are merely shallow pattern recognition tools that cannot learn the true meaning of language.However, this view underestimates the level of understanding that can be obtained from texts. While we must be careful about overestimating their capabilities, neither can we overlook the potential harm of underestimating their impact. Ultimately, we approach these models with humility and We need to realize that we still have a lot to learn about what models can and cannot do.

Source link